Data communication and data processing — a basis for definition

Data communication and data processing — a basis for definition

Data communication and data processing a basis for definition Neal B. Seitz The absence of a precise method of distinguishing data communication fro...

967KB Sizes 3 Downloads 248 Views

Data communication and data processing a basis for definition

Neal B. Seitz

The absence of a precise method of distinguishing data communication from data processing functions is complicating the Federal Communications Commission's Second Computer Inquiry and creating a climate of uncertainty in the teleprocessing industry. This article proposes a simple mathematical criterion for distinguishing between these two classes of functions, illustrates its application, and discusses associated regulatory policy issues. Mr. Seitz is with the National Telecommunications and Information Administration, Institute for Telecommunication Sciences, Boulder, CO, 80033, USA.

To define the proposed criterion for distinguishing data communication and data processing functions, it is first necessary to define precisely the term 'function', and to show how this definition applies to the interactions between the users and the supplier of a service. I define the term 'function' in the mathematical sense: i e a function is a set of ordered pairs of elements (x, y) such that one and only one value y corresponds to every argument x. The set of possible arguments is termed the 'domain' of the function; the set of possible values is termed the 'range' of the function. These terms are illustrated in Figure 1, which portrays the function as a machine which translates arguments into values. The above definition can be applied directly to the case of user supplier interactions by regarding the arguments as user inputs to the supplier; the values as supplier outputs to the user; and the supplier system as a finite-state machine (FSM) which transforms inputs into outputs. In general, such a machine will be capable of performing many different input/output transformations (or functions), each corresponding to a particular internal state (Figure 2). The service the machine provides to the user comprises the set of all input/output functions it performs. As a very simple example of these concepts, consider the case of a private line teletypewriter-to-teletypewriter service using Baudot terminals. The user inputs consist of operator key depressions at the source terminal; the supplier outputs consist of printed characters at the destination terminal; and the service consists of transforming key depressions into printed characters, in this case at a geographically remote location. The Baudot code is a 5-bit code, which means that only 25 or 32 distinct characters can be represented. To enable transmission of a complete alphanumeric character set, plus punctuation marks, etc, the Baudot terminals employ 'letters/figures shift' characters. When the user enters a 'letters shift' (1) character, the terminals interpret and print all subsequent key depressions as letter inputs until a 'figures shift' (t) character is entered; and conversely. The letters/figures shift characters are not communicated to the destination user.

0 3 0 8 - 5 9 6 1 / 8 0 / 0 1 0 0 4 9 - 1 4 $02.00 © 1980 IPC Business Press

49

Data communication and data processing - a basis for definition

Oomaln : X

Argument: x

-

~ FunchonF •.

value y ~

Figure

_

Range.Y

¥

--

L

~

Z

----

1. M a t h e m a t i c a l d e f i n i t i o n of

The Baudot teletypewriter-to-teletypewriter system described above can be regarded as a finite-state machine having two internal states: the 'letters state', in which it performs the function of transforming key depressions into printed letters; and the 'figures state', in which it performs the function of transforming key depressions into printed figures. Each function can be formally defined by an input/output table which maps key depressions into the corresponding printed characters (in each case, a one-to-one mapping); and the two functions together define the service the system provides. Two incidental remarks on the above definitions are appropriate: •

a function.

Internal store I 2 3...q Input set X

Output set Y

,,



g,vens , a t e (,ooge~

(domain)

~S // " ~ , A~u~ent

FSM deflmng table

_•

Input x

vobue

FSM ' (state q)

~ Output y

FSM function : y = F ( x )

Figure

2.

Finite

state

machine

performance adaptation•

A key element of the mathematical definition of a function is the property of single-valuedness; i e a single unique output is associated with each input. The criterion proposed here is based on the fact that this property does not hold in reverse; ie given a properly defined function, it may or may not be true that a single unique input is associated with each output. One can divide the set of all discrete functions into two mutually exclusive categories on this basis: •



1N.B.

Seitz

and

P.M.

McManamon,

Digital communication performance parameters for proposed Federal Standard 1033, Volume I, Standard Parameters, Report No 78-4, National Telecommunications and Information Administration, Washington DC, May 1978. 2 C.E. Shannon, 'A mathematical theory of communication', Bell System Technical Journal, Vo127, July 1948.

50

In the teletypewriter example, the 'letters shift' and 'figures shift' characters do not fall within the domain of either function, since no unique system outputs are associated with these inputs. Such 'state changing' inputs can be classified as 'communication processing' interactions, as discussed in Seitz and McManamon. 1 The above definitions are easily expanded to accommodate 'multiple-routing' situations, where a single input elicits outputs at more than one location, by defining a separate function for each output location. Pictorially, this approach expands the servicedefining table of Figure 2 in a third dimension (output location).

Functions in which a single, unique input is associated with each output. For all functions in this category, the input is uniquely derivable from the output (assuming no errors); and the purpose of such functions must be either to transfer the input information to the output, or to change its form of representation. It is proposed that all functions in this category be defined as 'data communication' functions. Functions in which more than one input is associated with some or all of the outputs. (Note that there is still one unique output associated with each input.) For all functions in the category, the input is not uniquely derivable from the output; instead, the function processes or reduces the input information in a useful way, to produce some new information. It is proposed that all functions in this category be defined as 'data processing' functions.

The proposed distinction between data communication and data processing functions can be stated more precisely in terms of the entropy concept from information theory. 2 Given a set of possible input (or source) symbols X = { Xl, X2~ X], - " " Xnt , each having a probability of occurrence p (x.), the information conveyed by the occurrence of any given symbol x i is defined as: I(x.)

= log2(1/p(x ~ )

= -logzp(x ~) As one would expect, symbols with a low probability of occurrence convey more information than symbols with a high probability of

TELECOMMUNICATIONS

POLICY

March 1980

Data communication and data processing - a basisfor definition

occurrence. The logarithmic function essentially normalizes the information measure so that, for example, two randomly-chosen symbols convey twice as much information as one. The entropy of the source is defined as the average information conveyed per source symbol: n

H ( X ) = - Z p(x~) logzp(x,) i=1

This averaging process 'weights' the information content of each source symbol by its probability of occurrence. A similar entropy expression can be written for the average information conveyed per output (received) symbol H (Y). The input and output entropies are related by a conditional entropy term which Shannon called 'equivocation'? The equivocation of a function is the average uncertainty about an input symbol x i, given the corresponding output symbol yj : H ( X / Y ) = - ~ p ( x , y~) logzp(x/),~) td

For single-valued functions, input entropy H(X), output entropy H ( Y ) , and equivocation H ( X / Y ) can be related by the equation: H(Y) = H(X)- H(X/Y)

The proposed defining property for data communication functions is: H(X) = H(Y) so that, H ( X / Y) = 0

In words, the intention of data communication functions is to preserve the entropy of the source; ie to transfer the input information to the output without introducing uncertainty or equivocation. A loss in entropy is a gain in order. A basic characteristic of data processing is that it increases order, making input data more structured, more organized, or more condensed, and therefore less random. On this basis, it seems reasonable to define data processing functions as essentially complementary to data communication functions; ie data processing functions are functions for which: H(X) > H(Y) so that, H(X/Y) > 0

In words, the intention of data processing functions is to reduce the entropy of the source in a defined way. The equivocation term H ( X / Y ) is a measure of increased order, rather than uncertainty, in this context. Thus, the proposed criterion divides all discrete functions into two mutually exclusive categories - entropy-preserving functions, which are associated with data communication; and entropy-reducing functions, which are associated with data processing.

Application examples 3 Ibid.

The definitions proposed above will be useful only to the extent that they are both consistent with previous definitions and intuitive expectations, and are yet sufficiently precise to be used in classifying

T E L E C O M M U N I C A T I O N S POLICY March 1980

51

Data communication a n d data processing - a basis f o r definition

functions within actual services. Presented below are preliminary application results which indicate that the proposed definitions do satisfy these two conditions. Consider the first condition. In its opening notice on the second Computer Inquiry, 4 the Federal Communication Commission (FCC) asserted that: The generic characteristic of the communication function is that the semantic content is not changed at the completion of a given process. Other sources have proposed a similar criterion for identifying a function as data communication. For example, O'Dwyer and Redderson 5 make the following statement: Data Communications is ... a function. It is the function of transporting encoded information from one point to another ... It does not, however, change the inherent informationalcontent. The data communication definition proposed herein is consistent with these definitions, but makes them more precise by associating the general concept of 'semantic content' or 'informational content' with a specific, measurable quantity: the source entropy. The F C C proposed to define the data processing function as follows: The use of a computer for the purpose of processing information wherein (a) the semantic content, or meaning, of input data is in any way transformed, or (b) where the output data constitute a programmed response to input data.6 O'Dwyer and Redderson propose a similar definition for data processing: 7 Data processing is a function. It is the function of operating upon information to increase its worth to the end user. The action of processing (or operating) changes the inherent information content of the data, so as to add value for the user. Again, the definition I have proposed makes the general definitions more precise, in this case by providing a specific mathematical te~t (H(X) > H(Y)) for determining when the 'semantic content' or 'inherent informational content' of input information has been 'transformed' by a function. The entropy test is applied here to three simple functions which are 'generally accepted' as data processing: 4 Notice of Inquiry and Proposed Rule Making, Docket No 20828, FCC 76-745, 41799 41FR, Federal Communications Commission, Washington DC, 1976. 5D.S. O'Dwyer and W.F. Redderson, 'User requirements for data communications in the United States', Proceedings of the Third International Conference on Computer Communication, Toronto, Canada, 3-6 August 1976. 6 0 p cit. Ref 4. The Commission explained in a footnote that condition (b) was intended to bring services such as process control and proprietary information retrieval within the ambit of the definition of data processing. These two service types are explicitly identified as data processing in the Tentative Decision, as discussed below. 7 0 p cit. Ref 5.

52

• • •

Adding two input numbers to determine their sum. Placing two input numbers in numerical order (a rudimentary form of 'merge'). Placing an input number in one of several categories based on its value (a rudimentary form of'sort' ).

Each function reduces the entropy of input information in one (or both) of two ways: •



By making the range (output set) smaller than the domain (input set). This reduces entropy by increasing the probability p(y~) of an output symbol's occurrence. By making the output symbol probabilities more unequal than those of the input. This reduces entropy by 'biasing' the output towards particular values.

Consider, first, the addition function. For simplicity, assume that each number to be added consists of a decimal digit which takes values

T E L E C O M M U N I C A T I O N S POLICY March 1980

Data communication and data processing - a basis for definition

between 0 and 9 with equal probability. Each set of two digits may be regarded as a single input symbol xi,since the addition operation requires two input digits. There are then 102, or 100, equally likely elements in the input set, and the source entropy is: H ( X ) = -log2(l/100) = 6.64 bits/symbol

Consider now the output set of the addition function. Adding two numbers between 0 and 9 will produce a sum between 0 and 18, so that there are 19 elements in the output set. These numbers are not equally probable since, for example, there are four input symbols, (0,3), (1,2), (2,1), and (3,0) which produce the output symbol (03), but only one input symbol (0,0) which produces the output symbol (00). In this case the output entropy is: 19 H ( Y ) = - Z P(Yj) logzP(y~) j=l

= 4.03 bits/symbol Since H ( X ) was 6.64 bits/symbol, the addition function has produced an entropy reduction of 2.61 bits/symbol. This is a case where the entropy reduction is a result of both factors cited above, ie a smaller range and more unequal symbol probabilities. If the 19 output symbols were equally probable, the output entropy would be: H ( Y ) = -log2( 1/ 19) = 4.24 bits/symbol

Consider now the 'merging' function. Again it is assumed that the function input consists of two decimal digits between 0 and 9, but in this case, the effect of the function is simply to place the digits in numerical order (largest digit first). Since the input set (domain) is the same as that of the addition function, the source entropy is again log2(100) = 6.64 bits/symbol. The output set (range) of the sorting function contains 55 elements, of which ten, the pairs (0,0), (1,1), (2,2) . . . . . (9,9), occur with probability 1/100; and 45 (all other pairs) occur with probability 2/100. The output entropy is therefore: H ( Y ) = -- 10( 1/ 100) log 2( 1/ 100) -- 45 (2/100)log z(,2/100)

= 5.74 bits/symbol It is relatively easy to calculate the conditional entropy H ( X / Y ) in this case; it is: H ( X / Y) = - ~ p ( x i,y:)log 2P(x,/ yj) td

= 10(1/100)logz(l ) + 90(1/100)1og2(2) = 0.9 bits/symbol This corresponds to the difference in entropy between H ( X ) and H ( Y ) . Again, the effect of the function has been to reduce source entropy. The third function to be considered is the 'sorting' function. As a very simple example of such a function, take the case where the input is a two-digit decimal number, uniformly distributed in the range 0099; and the output is a one-digit decimal number which indicates

TELECOMMUNICATIONS

POLICY March 1980

53

Data communication a n d data processing - a basis f o r definition

within which of the ten 'decades' (0-9, 10-19 . . . . . 90-99) the input n u m b e r lies. 8 Since each input symbol is equally likely, the source e n t r o p y is again:

H(X) = -log2(1/100) = 6.64 bits/symbol The output symbols are also equally likely, but there are ten rather than 100 possibilities. Thus:

H(Y) = -log2(1/10) = 3.32 bits/symbol The reduction in entropy, H(X/Y), is thus 3.32 bits/symbol. The purpose in presenting these example calculations is to illustrate the nature o f the e n t r o p y criterion. Actual calculations are not normally required to identify data processing functions, since the determining factor is e n t r o p y reduction per se, and not 'how m u c h ' the entropy is reduced. Turning to an application o f the proposed definitions in classifying functions within actual offered services, two recent c o m m o n carrier offerings which have created substantial c o n t r o v e r s y are the D a t a s p e e d ® 40/4 terminal offering of A m e r i c a n T e l e p h o n e and Telegraph ( A T & T ) ; 9 and the R o o m Status and Selection feature o f the New York Telephone C o m p a n y ' s Dimension 2000 ® P B X offering, l° T h e F C C describes the editing function o f the D a t a s p e e d ® 40/4 as follows: When the Dataspeed 40/4 is used, the operator can view the results of his typing on

8Such a function might be used, for example, in placing statistical data in appropriate 'class intervals' to create a histogram. An obvious way of accomplishing the function would be to divide the input number by 10 and take the integer value of the result. 9 M e m o r a n d u m Opinion and Order in the matter of A T & T revisions to tariffs FCC Nos 2 6 0 and 2 6 7 relating to Dataspeed 4 0 ®, FCC 76-1199, 43664, Federal Communications Commission, Washington DC, 1977. 10 IBM, Petition for Declaratory Ruling in the Matter of N e w York Telephone Co, Tariff No PSC No 800, Dimension ® PBX, Feature Package 9, filed before the FCC July 1978. New York Telephone, Opposition to CBEMA "s Petition for Declaratory Ruling and Other Relief in the Matter of N e w York Telephone Co, Tariff No PSC No 800, Dimension ® PBX filed before the FCC July 1978. 11 Cursor positioning and selection of the 'add character' function are 'state changing' operations similar to the 'letter/figures shift' operation described above. 12 Op cit, Ref 5.

54

the CRT display prior to transmission. Keystroke errors and omissions can be corrected by moving an operator-controlled cursor to the appropriate point in the text. Text ranging from a single to several lines can be deleted and the text contained in the memory will be made automatically to fill in the gap created by the deletion, and the resultant edited text will be displayed on the CRT. Or, several omitted characters can be inserted into the middle of a line, and the Dataspeed 40/4 will automatically correct the text to accommodate the newly entered characters. In either case the resultant edited text will be stored in the local memory until the editing function is completed. It will then be transmitted to the host computer in one burst - flee of operator errors. First consider the simplest case, where the editing function to be performed consists o f adding a single character to a line o f text. The input to the function is the o p e r a t o r keystroke, which identifies the character to be added; the output is the C R T display of the newly added character. 1~ This function is a data c o m m u n i c a t i o n function under the proposed definitions, since there is a one-to-one correspondence between the c h a r a c t e r displayed and the c h a r a c t e r typed. W h a t the terminal has actually performed here is the function o f storage, which can be regarded as ' c o m m u n i c a t i o n in time'. ~2 T h e same basic conclusions hold in the case where a string o f characters, rather than a single character, is added. The second editing function to be considered is the replacement of one or more displayed characters with other characters (text correction). Again, the input to the function is the character or characters to be added; the output is the corresponding displayed character(s); and the function is data c o m m u n i c a t i o n , since there is a one-to-one c o r r e s p o n d e n c e between the displayed and typed characters. The function of line deletion is slightly more difficult to discuss conceptually, since its 'output' is the omission o f data previously

T E L E C O M M U N I C A T I O N S POLICY March 1980

Data communication a n d data processing - a basis f o r definition

stored on the C R T display. ~3 Assuming that the 'delete line' function (or state) has been selected, the input consists of a character sequence which identifies the line to be deleted; and the output consists of the revised screen display, with the specified line deleted. Once again, there is a one-to-one correspondence between the output information (line actually deleted) and the input information (line to be deleted); and the function is data communication under the proposed definitions. The second service feature we consider is the Room Status and Selection feature, a part of Feature Package 9 of New York Telephone's Dimension 2000 ® PBX offering. This service feature uses the PBX to support certain hotel management operations, including (as described in the tariff filing): The capability to store and display the occupancy and cleaning status and type number of each guest room, facilitating housekeeping management, maid locating and room selection. The room selection function is further described as follows: 14 It displays the condition of any room, or the numbers of all rooms ready for sale. As a hypothetical example of how the latter function might operate, consider its application to a group of 10 similar rooms. Input to the function would consist of the current cleaning status ('needs cleaning' or 'cleaned') and occupancy status ('occupied' or 'vacant') of each room. The function output would identify all rooms in the 'cleaned/vacant' state. In principle, the status of each room can be communicated by two bits of input information, and a total of 20 bits would be sufficient to specify the status of all ten rooms. Assuming that each 20-bit input 'vector' is equally likely, the input entropy is: ~5

H(X) =

-logz( 1/2 z°) = 20 bits/symbol

Since each room is either in the 'cleaned/vacant' state or is not, each possible output of the function conveys 10 bits of information, and the output entropy is:

H(Y) =

13information can be conveyed by the absence, as well as by the presence, of a signal, so that this method of specifying a function's output is technically valid. 14 Lodging magazine, Bell System advertisement on Dimension ® 2 0 0 0 PBX Feature Package 9, June 1978, pp6-7. is I assume the room status information is accompanied by a room identifier on both input and output, and therefore disregard the room identifiers in performing the entropy calculations. 16 Feature Package 9 does contain such a function.

-log2( 1/2 ~°) = 10 bits/symbol

Thus, the specified function reduces source entropy by 10 bits/symbol, and is a data processing function under the proposed definitions. The above conclusion deserves two qualifying remarks. First, the reason that the room selection function constitutes data processing is that not all of the input information is output. The specific information, not output, is the cause of room non-availability, ie needs cleaning, occupied, or both. The function could be changed from data processing to data communication very simply, by identifying the status of all rooms (rather than only those in the cleaned/vacant state) in the output. The revised function would leave the job of selecting available rooms (a sorting function) up to the user. ~6 It is also important to note that the above conclusion applies to the specific function of room selection, not to the Feature Package 9 service as a whole. From its advertised capabilities, Feature Package 9 appears to be a 'hybrid service' under the current Commission Rules, ie a service which combines communication (message switching) and data processing.

T E L E C O M M U N I C A T I O N S POLICY March 1980

55

Data communication and data processing - a basis for definition A POINTS OF AGREEMENT I All DC funchons preserve source entropy 2 All funchons thor reduce source entropy ore DP B CONVERSE STATEMENTS I ~,11funchons thor preserve source entropy ore OC 2All DP funchons reduce source entropy C. IMPLICATIONS A AND B TRUE

H(X)=H(Y) H(X)>H(Y)

Regulatory policy issues There are two interrelated policy issues associated with the possible use of the criterion proposed above in establishing regulatory boundaries: an issue of semantic accuracy and an issue of application context. The applied logic text T h i n k i n g S t r a i g h t contains a perceptive comment on the nature of definition proposals which aptly introduces the first issue: ~7 Though a definition proposal is neither true nor false, and therefore cannot be tested, it is by no means immune to criticism, for it may be extremely consequential. It embodies a decision to adopt certain meanings for certain terms, and the relations between those meanings and other meanings already established may make a great deal of difference, especially in law and politics.

(Crdenon suff~c*ent)

A TRUE, B NOT TRUE

H(X):H(Y)

DP

(Adddlonol crderio needed) F i g u r e 3 . D e f i n i t i o n i m p a c t issue. DC = Data c o m m u n i c a t i o n . DP = Data processing.

~7 M.C. Beardsley, Thinking Straight." Principles of Reasoning for Readers and Writers, Prentice-Hall, E n g l e w o o d Cliffs, N J, 1 9 7 5 . 18These d i s a d v a n t a g e s could be mitigated by the flexibility inherent in the service classification scheme suggested below.

56

The above examples indicate that the proposed definitions for data communication and data processing conform closely to previously established meanings. Nevertheless, these examples are not exhaustive, and there are scientific functions which could be advanced as counter examples (eg matrix inversion). The essence of the semantic accuracy issue is summarized in Figure 3. A survey of the record in the Second Computer Inquiry suggests that very wide agreement could be reached on the two statements set forth under item A: that is, all data communication functions preserve source entropy (by intent), and any function that reduces source entropy (by intent) is data processing. The logical converses of these two statements are set forth under item B: ie all entropy-preserving functions are data communication, and all data processing functions reduce source entropy. These are the statements which must be examined critically in resolving the issue of semantic accuracy. Neither statement can be proven to be true or false by deductive means; any such effort would beg the question. The issue is not whether these statements are true or false in an absolute sense, but whether they should be regarded as true or false for regulatory purposes. The most appropriate basis for resolving this issue is an assessment of market consequences. The immediate regulatory consequences of accepting or rejecting statements (B) are summarized in Figure 3C. If these statements are accepted, the entropy test alone is sufficient to classify any function as data communication or data processing, and there is no need to develop or consider other criteria. Adopting this approach would simplify the process of regulation by reducing the need for a d hoc proceedings, and would minimize supplier uncertainty about the regulatory status of new service offerings. Possible disadvantages would be relative inflexibility and a tendency to place some questionable scientific functions in the data communication category.18 If statements (B) are rejected, the entropy criterion can only identify a subset of the data processing functions (ie those reducing entropy); additional criteria will be needed to distinguish 'entropy-preserving data processing' functions from 'entropy-preserving data communication' functions. Possible benefits of this approach would be regulatory flexibility and a tighter bound on regulatory authority; disadvantages could be increased regulatory complexity and supplier uncertainty. The necessary additional criteria could be developed,

T E L E C O M M U N I C A T I O N S POLICY March 1980

Data communication and data processing - a basis f o r definition All funchons

[

1

Entropy preserving

Enhopy reducing

[

[

r-

1

Source/deshnohon

I Source/deshnohon

Source/deshnohon

dlfferenl

same

d,fferent

I

I

Form some

Form

different

Form

n-

Media

l

Code

Both

I I

I EM

I I

I

Boundary 2 (IBM~

Boundary 3 (FCC- SUP)

EM

Form same

different

L E

F-

I

Form some

Non -

Med~o Code

I

F-I

Both

EM Non-

EM

EM

1

Form some

Form

different

L

I

l SOurce/deshnahon some

Form

d,fferent

r L1 Med,a

Code

Both

EM NonEM

Boundary

! Media

Code

Both

Boundary

I

4

(FCC- NOI)

(AT 8~T)

F i g u r e 4 . C o m m o n c a r r i e r f u n c t i o n classification. DC = Data c o m m u n i c a t i o n s ; DP = Data processing; E = Electromagnetic; Non-Em = Non-electromagnetic.

Elias, 'Minimum times and memories needed to compute the values of a function', Journal of Computer and 19peter

System

Sciences,

pp196-212.

Volume 9, 1974, Output entropy is one such

measure.

Response of International Business Machines Corporation in the M a t t e r of Amendment of Section 64.702 of the Commission's Rules and Regulations (Computer Inquiry), Docket No 20828, filed before the FCC June 1977. CBEMA, Memorandum of Possible Revisions in the Proposed Communications Act of 1978 [FIR 13015), submitted to the 20 IBM,

Chairman of the Subcommittee on Communications, US House of Representatives, October 1978. 2~ A T & T , Comments of American Telephone and Telegraph Company in the Matter of Amendment of Section 64.702 of the Commission's Rules and Regulations (Second Computer Inquiry), Docket No 20828, filed before the FCC June 1977. GTE, Comments of the GTE Service Corporation in the M a t t e r of Amendment of Section 64.702 of the Commission's Rules and Regulations (Second Computer Inquiry), Docket No 20828, filed before the FCC June 1977.

Management, Final Survey Results: the Role of the Computer in Common Carrier Services, Center for C o m m u n i c a t i o n s Management, Ramsey, N J, 1977.

Z2Center for C o m m u n i c a t i o n s

TELECOMMUNICATIONS

based for example on some measure of computational complexity; ~9 the feasibility of applying such criteria to actual services has not been demonstrated. The market impact of this decision will depend heavily on the regulatory context in which the selected definitions are applied. As discussed below, the Commission's Tentative Decision would do much to reduce that impact, and there appears to be relatively little risk associated with applying the proposed definitions in that context. The classification of functions as entropy-preserving or entropy-reducing is useful in any case. The second issue to be addressed in connection with possible use of the entropy criterion is the issue of application context. Simply stated, that issue is how, and on whom, the selected definitions for data communication and data processing should be imposed in establishing appropriate boundaries between regulated and unregulated services. The comments of the Computer Inquiry respondents on this issue reflected fundamentally different attitudes towards regulation p e r se. Representatives of the data processing and interconnect industries recommended broad deregulation in the teleprocessing area, to stimulate competition and innovation. 2° The traditional common carriers argued that they should be given broad flexibility to offer innovative services under regulation, to meet the needs of small users and to ensure wide geographical availability of services. 2~ One independent survey suggested that a preponderance of users also favour a permissive regulatory policy with respect to common carrier offerings of data processing services. 22 Figure 4 illustrates one method of comparing the various regulatory proposals in definitional terms. All potential common carrier functions are first divided into two primary categories entropy-preserving functions and entropy-reducing functions. Each

POLICY

March 1980

57

Data communication a n d data processing - a basis f o r definition

primary category is then divided into ten subordinate categories based on the following criteria: •

• •

Whether the source and destination of the information processed by the function are the same or different (as determined by physical input/output interface). Whether the form (code and medium) of the information input and output by the function is the same or different. The nature of the input and output form (in the case where the two are the same); or the nature of the form change (in the case where the two are different).

The first criterion distinguishes functions which convey information between different users (eg terminal operators) from functions which serve a single user. This distinction is significant since, as noted above, all storage of information can be regarded as 'communication in time'. The second criterion distinguishes functions involving form conversion from functions which deal with a single input/output form. The term 'code' refers to the symbolic representation of information (eg ASCII, Baudot); the term 'medium' refers to its physical representation (eg punched paper tape, electromagnetic energy). The third criterion does two things. In the case where the input and output forms are the same, the criterion identifies the form as electromagnetic (EM) or nonelectromagnetic (Non-EM). In the case where the input and output forms are not the same, the criterion identifies the type of form conversion performed, ie media conversion, code conversion, or both. All of the initial proposals advanced in connection with the Second Computer Inquiry can be regarded as a 'mapping' of regulatory control onto some particular subset of the functions identified in Figure 4. For example: •





2a Supplemental Notice of Inquiry and Enlargement of Proposed Rule Making, Docket No 20828, FCC 77-151, 47 CFR Part 64, Federal Communications Commission, Washington DC, 1977.

58

The definitional approach proposed in the Commission's opening notice essentially divides the decision tree down the middle, and imposes regulation on everything to the left, and nothing to the right, of the midline (Boundary 1 in Figure 4). IBM's proposal, and that of CBEMA, moves the regulation/no regulation dividing line to the left, so that only 'pure transmission' functions (entropy preserving, source/destination different, form same, EM) are subject to regulation (Boundary 2). The Commission's third item of inquiry in its Supplemental Notice of Inquiry, 2~ ie 'whether the offering of customer-premises equipment which performs any information processing activity, other than basic media conversion, should be considered a communications common carrier activity', postulates Boundary 3.

Interestingly, AT&T's proposal is basically consistent with Boundary l, but asserts that 'common carriers must be allowed to engage in the furnishing of processing, including data processing, in providing communication services'. This suggests that the boundary line in question should be moved all the way to the right (Boundary 4), subject to the condition that common carriers would provide only data processing functions which enhance or support associated communication services. AT&T states that 'this use of processing, including data processing, in the provision of common carrier services can be

T E L E C O M M U N I C A T I O N S POLICY March 1980

Data communication a n d data processing - a basis f o r definition

distinguished from the provision of data processing service to the public'. 24 As noted above, the entropy criterion is generally consistent with the definitions proposed by the FCC in its opening notice on the Second Computer Inquiry. The entropy criterion could therefore be advanced as a method for resolving the Computer Inquiry on the relatively narrow basis proposed in that notice; ie: •



Establish a definite line of demarcation between data communication and data processing services. Abolish the 'hybrid service' categories. Continue the regulatory policies set forth in the first Computer Inquiry (Docket No 16979), ie total regulatory forbearance with respect to data processing services, and maximum separation of common carrier entities offering data processing services (through 'arms-length' subsidiaries).

Such an approach might have some short-term benefits. It would not be advantageous on a long-term basis, however, since it does not address the broader economic and social issues identified in the subsequent record. The comments received by the FCC in the Second Computer Inquiry reveal a broad consensus on two points: • •

The revised rules should not preclude the commingling of data communication and data processing in a single 'hybrid' service. Regulatory boundaries should not be based solely on 'what is being offered' in a service; they should also consider 'who is offering' the service, ie the nature and market position of the supplier.

The National Telecommunications and Information Administration supports this view, and has further proposed that federal regulation be restricted to common carriers having 'dominant market power'. 25 On 2 July 1979, the FCC released a Tentative Decision and Further Notice of Inquiry and Rule Making on the Second Computer Inquiry. 26 This Tentative Decision outlines a revised regulatory plan which directly implements the two points of consensus noted above. Briefly: •

240p cit,

Ref 21. Apparently, AT&T believes the room selection function, described above, meets the 'communication enhancement' criterion. AT&T's Dataspeed 40/4 offering described briefly above, appears to perform all functions in the data communication category except code conversion. 25 H. Geller, Statement of Henry Geller, Assistant Secretary for Communications and Information, US Department of Commerce, on S 611 and S 622, before the Communications Subcommittee, US Senate, April 1979.

26Second Computer Inquiry: Tentative Decision and Further Notice of Inquiry and Proposed Rule Making, Docket No 20828, FCC 79-307, 47 CFR Part 64, Federal Communications Washington DC, 1979.

Commission,







Interstate communications common carriers are divided into two categories - 'underlying' carriers, who own their own transmission facilities; and 'resale' carriers, who lease the transmission facilities they require from underlying carriers on a tariff basis. Common carrier services are divided into three primary categories - voice, basic non-voice, and enhanced non-voice. Enhanced non-voice services are distinguished from basic nonvoice services by the fact that they use computer processing 'to act on the form, content, code, protocol, etc of the inputted information'. Underlying carriers are prohibited from directly providing enhanced non-voice services, but are allowed to do so 'through a separate corporate entity on a resale basis'. Underlying carrier computer facilities 'may not be used for those computer processing applications associated with enhanced non-voice services'. Customer premises equipment offered by communications common carriers is divided into two categories - transducers and

T E L E C O M M U N I C A T I O N S POLICY March 1980

59

Data communication and data processing - a basis f o r definition



basic media conversion devices, and other (enhanced) equipment. Underlying carriers may offer equipment in the first category in conjunction with their tariffed services, but may not offer equipment in the second category. Resale carriers and subsidiaries may offer either type of equipment on either a tariffed or a non-tariffed basis. E n h a n c e d n o n - v o i c e s e r v i c e s are divided into two categories enhanced non-voice communication services and enhanced nonvoice data processing services. Resale carriers and subsidiaries may offer both categories of services - communication services under tariff, and data processing services on an unregulated basis.

This revised approach would do much to reduce the impact of the data communication/data processing definitional conundrum on the development of innovative 'enhanced' services. Carriers would have incurred a substantial risk in developing such services under the Commission's original proposal; if a new service was classified as data processing the developer would be barred from providing it altogether. This would not be true under the revised approach, since resale carriers would be permitted to offer both enhanced communication and enhanced data processing services. The Commission's Tentative Decision still requires a definitional distinction between communication and data processing services, since the enhanced non-voice category subsumes both types of services and only the first would be subject to regulation. Paragraph 83 of the Tentative Decision proposes that this distinction be based on the following definitional structure: •







'Computer processing' is the use of a computer for processing information where the output information constitutes a programmed response to input information. 'Processing' includes i n t e r alia, arithmetic and logical operations, storage, retrieval, and transfer. 'Data processing' is the computer processing of input information for the purpose of providing additional, different, or restructured information. A 'data processing service' is the offering for hire of computer processing capabilities for the purpose of: transforming or altering for the subscriber of the service the information content or meaning of information provided by the subscriber; maintaining, managing, or providing a data information bank or information retrieval service, whereby information may be selectively retrieved by or for a subscriber to the service; or monitoring or controlling an ongoing non-communications process or event. 'Hybrid data processing service' is an offering of a data processing service utilizing common carrier communications facilities for the transmission of data between remote computers and customer terminals.

These definitions represent a significant improvement on those proposed in the Commission's earlier notices. They properly associate the 'programmed response' criterion with computer processing generally, rather than with data processing in particular; they clarify the important distinction between internal data processing and the provision of a data processing service; and, perhaps of most

60

T E L E C O M M U N I C A T I O N S POLICY March 1980

Data communication and data processing - a basis f o r definition

importance, they acknowledge the substantial economic benefits of commingling communication and data processing functions in a single service offering. In sum, they provide a useful general framework for regulatory decision making. A major concern with the revised definitions is that they depend on a common understanding of certain key concepts which are not defined. The proposed definition for data processing will be useful only if all interested parties can agree on whether a given function does or does not provide 'additional, different, or restructured information'. Similarly, the proposed definition for data processing service hinges on agreement as to whether the 'information content' or 'meaning' of input data has been 'transformed' by a function, and on agreement as to what constitutes a 'non-communications' process or event. Revised Section 64.702(b) of the Commission's Rules states, in part, that: ... any data processing performed by a carrier as part of a tariffed service must directly relate to and be for the purpose of providing a communication service ... This provision will be difficult to implement in the absence of an agreed definition for 'communication service'. The pleadings in connection with NY Telephone Company's Room Status and Selection offering 27 clearly demonstrate the extent to which competitors can differ in their interpretation of these words. The problem here is evident: the definition of words in terms of other words is an endless process with abundant potential for misinterpretation. The entropy criterion proposed in this article could mitigate this problem, and assist the Commission in implementing its Tentative Decision in the Second Computer Inquiry, by providing a more precise method of distinguishing enhanced non-voice communication services from enhanced non-voice data processing services. As noted above, the entropy criterion provides a mathematical method of determining when the 'information content' of input information has been 'transformed' by a function. The proposed criterion also classifies 'selective retrieval' and 'process control' functions in exactly the manner proposed by the Commission, thus eliminating the need for separate definitional treatment of these functions. I do not regard the simple monitoring of a non-communication process as a data processing function, and the proposed criterion would not classify it as such. The essential point in applying the proposed criterion in the context of the F C C ' s Tentative Decision is that any service can be regarded as a collection of distinct input/output functions. It follows that any enhanced non-voice service can be classified as communication or data processing by classifying its component functions. Various detailed classification procedures could be used. The simplest would be the following: •

• •

27 Op cit, Ref 10.

Classify each function, within a service in question, as a communication or data processing function based on the entropy criterion. Compute the relative proportions of communication and data processing functions within the overall service. Classify the overall service as communication or data processing by comparing, for example, the proportion of data processing functions within the service with a predefined threshold.

This procedure would enable the Commission to determine the

T E L E C O M M U N I C A T I O N S POLICY March 1980

61

Data communication and data processing - a basis for definition

'primary purpose' of any given service in quantitative terms. While conceptually simple, this classification procedure is not necessarily preferable, and it could be complex and burdensome in some situations. Another alternative, offering the benefits of increased flexibility and reduced regulatory burden, would be to classify services on the basis of certain 'key functions' provided. The subset of functions examined could be selected on the basis of various criteria, including: • • •

Functions emphasized by the service supplier in soliciting subscribers. Functions judged to be most characteristic of the overall service. Functions most frequently used by service subscribers (an e x p o s t f a c t o determination).

Summary The proposed criterion defines the term 'function' in the mathematical (or input/output) sense, and divides the set of all discrete functions into two mutually exclusive categories: entropy-preserving functions and entropy-reducing functions. The entropy-preserving functions are associated with data communication, and the entropy-reducing functions are associated with data processing. Preliminary applications suggest that the resulting definitions for data communication and data processing conform closely with intuitive expectations; and are sufficiently simple and precise to be used in classifying functions within actual services. As an example, the proposed definitions would place the editing functions of the Dataspeed ® 40/4 in the data communication category, but would classify one function within the controversial Room Status and Selection feature of the Dimension 2000 ® PBX as data processing. Acceptance or rejection of the proposed definitions is ultimately a regulatory policy decision, and should be weighed carefully from the standpoint of market consequences. It appears that the proposed criterion could assist the Commission in implementing its Tentative Decision in the Second Computer Inquiry by providing a more precise method of distinguishing 'enhanced non-voice communication' services from 'enhanced non-voice data processing' services.

62

T E L E C O M M U N I C A T I O N S POLICY March 1980