Development of IR systems: New direction

Development of IR systems: New direction

InformationProcessing& Management,Vol.32, No. 3, pp. 373-386, 1996 Copyright© 1996ElsevierScienceLtd Printedin GreatBritain.All fightsreserved 0306-45...

1MB Sizes 2 Downloads 40 Views

InformationProcessing& Management,Vol.32, No. 3, pp. 373-386, 1996 Copyright© 1996ElsevierScienceLtd Printedin GreatBritain.All fightsreserved 0306-4573/96 $15.00+0.00 0306-4573(95)00057-7

Pergamon

DEVELOPMENT

OF IR SYSTEMS:

NEW

DIRECTION

VALERY I. FRANTS, I JACOB SHAPIRO" and VLADIMIR G. VOISKUNSKII3 Department of Computer and Information Science, Fordham University, Bronx, NY 10458, U.S.A, -~Departmentof Statistics and Computer Information Systems, Baruch College (CUNY), New York, NY 10010, U.S.A. and ~National Library of Russia, 18 Sadovaya Street, St Petersburg 191069, Russia

(ReceivedApril 1995; accepted July 1995) Abstract--The paper addresses a question of improving the quality of satisfaction of information need (IN) through the use of information retrieval (IR) systems. This improvement is achieved by more fully taking into account the characteristics of IN when developing IR systems. For this reason the characteristics of IN are investigated very carefully and on the basis of this analysis the paper proposes a new instrument in IR systems which takes into account different components of IN. Special attention is paid to the practical implementation of such IR systems. Using the goal component of IN as an example the paper demonstrates how to take this component into account in the process of IR system functioning, namely by using a list of tasks and requirements. It is shown that during the implementation of this list several problems have to be addressed and the paper discusses some possible solutions. The suggested methods and approaches are easily adaptable to any IR system and their realization is quite simple. The paper stresses the importance of this new direction in developing IR systems. Copyright © 1996 Elsevier Science Ltd

INTRODUCTION Any information activity is aimed, directly or indirectly, at satisfying an information need (IN). This activity per se consists of an actually existing collection of methods and forms used in the satisfaction of IN. These methods and forms supplement rather than duplicate each other, and supplement in the sense that only taken together they take into account to a certain extent all presently known properties and features of IN. This paper deals with one form for satisfying IN: Information Retrieval System (IR system). Since an IR system (as well as any other form) is developed only because IN exists and only with the purpose to satisfy IN, it seems obvious that the more comprehensive the set o f lN properties and characteristics which is taken into account in the development of IR systems, the more successful is the satisfaction of IN. This statement is not something new, and as early as 1974, it was discussed in a paper by Voiskunskii and Frants, and then, in 1988 Frants and Brush considered in great detail the effect of some IN properties on the structure and design o f IR systems. But we mention it here because the study of new IN features with subsequent analysis of their effect both on the nature of user service and on IR system design, represents an important and promising line in the development of such systems. Since the present work continues in this direction, we will begin our discussion with an overview of IN.

THE NEED FOR INFORMATION Even long before the rise of Information Science, IN was studied by psychologists (with limited success). As Information Science became more mature, and especially with the beginning of the development of IR systems, interest in the study of IN increased considerably. For example, the yearbook of the International Federation of Documentation (1973) pointed out 373

374

Valery I. Frants et al.

that 144 works on IN study were published by 1967. Nevertheless, despite the fair number of studies, no essential results have been obtained. This was indicated in the preface of the yearbook mentioned above: 'IN studies have obviously not lived up to expectations, because they only helped to discover what has already been known through the practice of publishing scientific and technical literature, as well as from the experience gained from library/ bibliographic work". Some authors (see, for example, Jahoda, 1966) even denied the value of IN study, whereas others (see, for example, O'Connor, 1967, 1968) believed that it was impossible to determine what should be meant by IN. Moreover, since the mid-1970s, to a certain extent as a result of such beliefs, the number of publications on IN study has decreased dramatically. Nevertheless, the situation has proved to be not so gloomy. Several authors (see, for example, Allen, 1991; Bates, 1989; Belkin, 1987; Belkin & Croft, 1987; B uckland & Florian, 1991; Fidel & Soergel, 1983; Lancaster, 1968, 1979) introduced some interesting ideas which helped better understanding of such phenomenon as IN. The present paper takes as a basis the approach for understanding of IN which was developed by Voiskunskii and Frants (1974), and Frants and Brush (1988). As a starting point, we note that nobody doubts the necessity of satisfying IN. This is assumed to be obvious. Without casting doubt on this fact, we do ask the following question: what causes this necessity, i.e. why should man's IN be satisfied? The question is not a result of idle curiosity. Clarifying the role of information in man's life contributes to the understanding of IN's origination mechanism, aids in revealing its properties and characteristics. And why do we need this? As mentioned above, a more comprehensive consideration of knowledge about properties and characteristics of IN in the creation of IR systems would allow for a better satisfaction of IN. We should point out that a detailed discussion of the role of information in man's life, as well as of the mechanism of the formation of IN is described in the Appendix. This is in no way done because the issue under consideration is not sufficiently important. The point is that this consideration is well beyond the scope of the present work. So in the following presentation we will restrict ourselves to a summary of the most essential (within the scope of the present work) statements and conclusions. Well, does a man need information and why? At first glance it seems obvious that a man needs only matter and energy in order to live (survive). But the habitat is not a nutritious broth in which all a man needs to do is to open his mouth to become full. It is necessary to find resources, to be able to get and use them. All this is possible solely on the basis of information. That is why man's conscious activity, any of his conscious behavior is aimed at his survival and is possible only on basis of information. The information is needed and precedes both our simplest actions and the most sophisticated ones aimed at achieving the best conditions for survival in the future. For example, if we want to live we should know (have information about) where to find drinking water and how to get it no matter what conditions we were in. This is the information that will govern our behavior. If we have no such information, we will try primarily to obtain it, i.e. our behavior will be governed by information at our disposal regarding a method of obtaining necessary information. In the absence of information but when actions are mandated, we act at random and in doing so understand the unreliability of such actions. Information is also required in order to avoid hazards that exist in the habitat. Hence, we can assert that for his survival a man needs matter, energy and information. Now we will consider how IN originates. It seems clear that a man does not need information in general but instead rather specific information pertaining to the situation which he is in. In other words, to choose the most successful line of behavior (in terms of survival) in a particular situation, a man should have the information about this situation which is required for such behavior. But in order to have such information, a man should have a desire to have it, i.e. he should feel the need for such information. Hence, on the one hand a need may be considered as a mechanism that pushes a man to seek necessary information, and on the other hand--as a tool which helps to determine the usefulness (significance) of signals among the multitude of signals perceives by a man. Furthermore, it becomes clear that a man's need is always the result of the situation which he is in. It is the situation that gives a rise to such psychological state as IN. But how does this occur, i.e. how does IN originate? A whole host of signals about both the status of the organism itself and the habitat are

Developmentof IR systems

375

perceived by a man through his receptors. These signals enter the brain where they are used for assessing the situation. If the assessed situation contains a problem, for example it is understood that the organism needs a supply of food, or a direct hazard exists in the habitat, then inside the man an instinct for solving the problem arises. Since solving a problem is possible only via some line of behavior, and the behavior itself is dictated by a behavioral algorithm which in turn is created on the basis of information, then in the event that the man does not have a ready-made algorithm, even the realization of the existing problem itself creates a mental state which is perceived as a need for information required for creating a behavioral algorithm. Physically, such a state arises as a result of stimulating the brain by signals from receptors and is determined by a section and a form of this stimulation in the brain. In summary, we underline once more that a mental state, such as an information need, precedes any activity and is an indispensable partner in satisfying any physiological need. Now we will mention briefly some basic statements on IN which are considered in detail in the above-cited papers of 1974 and 1988. The concept of "information need" was considered as a primary (basic) concept indicating the mental state of a man. In fact the thematic boundaries of a whole IN are never clearly defined ,and can vary with time, and many researchers have noted the fact that the more comprehensive knowledge a man possesses, the broader his IN boundaries are. There are various types of IN and it should be noted that every type of IN has its own defining properties. They could be illustrated by the following examples. Say, a man may need information such as: (1) which car models were produced by Ford in 1994? (2) where was the w o r d football championship of 1990 played? On the other hand, a man may need information on such problems as: (1) study of information need; (2) waste-water treatment, etc. The information need represented in the first two examples differs from the one represented in the following two examples. Indeed, if the boundaries of the first type of IN are fully defined, in the second case such boundaries first of all cannot be clearly determined and secondly these boundaries may vary with time. Besides, comprehension of the information in the first type IN requires much less intellectual effort than it does in the second case. The essential difference is also the fact that if information is found for the first type IN, this need is satisfied by it and, hence, the need p e r se "vanishes"; but the second type IN is generally not satisfied by required information being found and continues to persist for a long period of time. The works mentioned above which further develop the described idea of IN consider the above distinctions for its different types as the properties of these types. These areproperty sets inherent in different IN types that are proposed for use in identifying IN types themselves. It is clear that besides the types considered there are other IN types as well. Moreover, refining properties of the above IN types may generally lead to "splitting" each of them into several new types. Nevertheless, we believe it would be useful to have corresponding names for all welldefined IN types at each turn of the IN study. Incidentally, the above first type IN is called a concrete information need (CIN), and the second type I N - - a problem-oriented information need (POIN). It should be noted that in general the information retrieval processes which are used to satisfy IN of various types differ from each other. For example, in the information retrieval satisfying the second type IN, a dialogue with a user (feedback) is essentially necessary (the boundaries of this need are not clearly defined) and in the dialogue process they are refined; while in the information retrieval satisfying the first type IN a user's dialogue may be useful but is not necessary. In addition, the information retrieval satisfying POIN assumes iterations, i.e., realization in each array of new entries (this need is generally not satisfied by a required information found) while the information retrieval satisfying CIN terminates after finding the required information. The information retrieval satisfying POIN is known as a documentary retrieval, and the information retrieval satisfying C I N - - a s a factual retrieval. Both documentary and factual retrievals include a number of well-defined subprocesses. For example, translation of documents from a natural language into an information retrieval language (IRL) resultign in the creation of document profiles; translation of queries form a

376

Valery1. Frants et al.

natural language into IRL resulting in creation of query formulation etc. However, even identical subprocesses in different types of information retrieval are normally realized in different ways. Now, that we have described the basic notions of IN let us consider some IN features.

COMPONENTS OF IN It is obvious that the subject of the need is an individual, which in the future will be called "the user." A need is experienced by the user as some sort of anxiety, stress, excitement, discomfort, etc. Of course each of the named mental states has its nuances, but they can be referred to as one kind of class, which can be called "displeasure". Satisfaction of a need (process) yields a sense of satisfaction, relief, enjoyment, etc. This class of mental sensations can be called "pleasure". A similar polarization of mental states was noted comparatively long ago. In the middle of the last century the German psychologist Fechner (1860) had already proposed a theory of pleasure and displeasure. He noted that insofar as conscious stimulation always relates to pleasure and displeasure, both pleasure and displeasure can be represented as having a psychophysical relation to stability conditions. This direction received new development at the beginning of our century in the theory of psychoanalysis, created by the Austrian psychologist Freud. He formulated the pleasure principles, the principles lying at the base of the organization of the subject's mental activity. However, the nature of these mental states were not investigated at the time. Authors considered pleasure and displeasure to be not the result of something, but as something given, a starting point, the origin of vital activity. They assumed the presence of the soul, i.e. a particular nonmaterial substance independent of the body, of which pleasure and displeasure are characteristic (a property of the soul). And Freud (1991) who felt dissatisfaction (due to lack of information?) with this explanation wrote: "We would have been filled with gratitude for a philosophical or psychological theory that would have been able to explain to us the significance of how imperative the feelings of pleasure and displeasure are for us. Unfortunately, nothing is being suggested which is acceptable to us. This region is the darkest and most inaccessible region of mental l i f e . . . " . It is obvious that an IN is directed to some object or phenomenon of the world surrounding us. Such focus is generally known as a thematic one. Particular emphasis should be placed on the fact that each information user has a good idea about the thematic focus of his mental state, although, as pointed above, in the event of POIN the thematic boundaries of this mental state are not always clearly defined. Since we are interested in satisfying IN by means of an IR system, the IR system should take into consideration the user's thematic focus of IN. It is quite obvious that it enters the system directly from the user himself. However, since the thematic boundaries of POIN are not precisely sensed by the user, he cannot present to the IR system adequate information about his actual mental state, i.e. it does not always happen that an IR system has comprehensive information on POIN to satisfy POIN successfully. Consider next the way (how, in what form) the information on POIN enters the system. To begin with, for this purpose the user expresses his POIN. By IN expression is meant some process that results in a representation of information on IN in some language. The formulation of information about this need in a natural language is commonly called a query. Any query includes thematic focus of the user's IN. We note that in most cases the user expresses IN in the form of queries, and the queries frequently contain only a theme (a thematic focus of originated IN) or as we refer to it b e l o w - - a thematic component of IN. For example, the queries above illustrating the existence of CIN and POIN contain only a thematic component. But which other components characterize such a mental state as POIN? This will be considered next. To begin, we will start by stating that such an unpleasant state as IN comes up in a variety of life situations and these situations can determine different requirements for the information that a user needs. In other words, with the same thematic focus but in different situations, different information output may be more appropriate. Let us illustrate this by the following example, Assume that some documentary IR system has received an identical query from users A, B and C, that is: "Canning of vegetables". The thematic focus of this query is clear and in many

Developmentof IR systems

377

systems this is sufficient for constructing a query formulation (either automatically or via an intermediary) and subsequent performing of a retrieval. As the result of the retrieval each user who asked the query mentioned above gets the same output. Nevertheless, it is well known that in most cases the assessment of this output by users may be different. On the one hand this is explained by the lack of clear thematic boundaries of the corresponding IN as already noted above, and on the other h a n d - - b y different knowledge levels inherent to each of the users that was noted above as well. And these are not the only causes of different assessments of information output. In continuing consideration of our example, let us assume that user A is a researcher engaged in problems of long-term storage of vegetables and is interested in new ideas on canning of vegetables. Clearly in this situation he has little interest in popular literature, textbooks and reviews of well-known works. Now let us suppose that user B wants to can several jars of cucumbers, tomatoes and cabbage at home, and his query is related to this (i.e. this situation). In this case the user will have little interest in recent theoretical works in this field and will be well satisfied by a popular article containing useful advice. In the case of user C assume that he is offered several lines of research for selection (this query is only one of such lines) and he wants to make a decision as to which one to use in the future. The most appropriate (and sufficient) for him would be, seemingly, a comprehensive review of scientific publications on canning of vegetables. So, it can be asserted that in different life situations different mental states (INs) can arise but nevertheless the thematic component of different mental states may be the same. Besides, various states arising in different users assume (see the example above) distinctive output for different users. From the preceding it becomes clear that if a query includes not only IN thematic component, but also other IN components, and these complementary components could be taken into account in retrieval, then the user service could be improved. It is no coincidence that we call any other IN components complementary ones. The point is that the IN's thematic component should always be expressed (that is actually done) and, as pointed out above, it will be sufficient for retrieval. However, if any other component (not a thematic) is expressed the retrieval becomes impossible. It is because of this that we call any component differing from the thematic one as complementary. This is apparently the reason why in most systems the users are required to express only the thematic component of IN. And nevertheless, as we have seen already, complementary components can provide a marked positive effect in user service. To be fair, it should be noted that in a number of cases complementary IN components have essentially been used by those librarians who communicated directly with readers helping them to carry out a retrieval. These were librarians who showed special care for a reader by personalizing his need, i.e. took into account some of his wishes and features typical of him personally (i.e. his situation), rather than simply following a query formulation (i.e. a thematic IN component). The librarians did this, of course, intuitively from their background of experience rather than on a serious scientific basis. Nevertheless, it resulted in a better user's service. Unfortunately, in automated IR systems this aspect of retrieval does not draw much attention, although it is obvious that if these systems are built with special care for a user, typical of librarians, then better effects could be expected from retrieval automation. Returning to the example above, note that the situations considered are related to different goals facing the users. When we pointed out the usefulness of expressing complementary IN components in a query, we proposed in essence complementing a thematic component through description of a specific user's goal. However, the notion "query" should perhaps retain a meaning customary for researchers, namely: a query usually means the result of expressing IN's thematic component in a natural language. Because of this the "portion" of IN presentation in a natural language which is dedicated to the description of the goal at hand, we will name a task, i.e. a task is the result of expressing IN's goal component in a natural language. Thus, we have shown that it is reasonable to present the result of expressing IN in a natural language in the form of at least two independently retrieved directives: a query and a task. We say "at least" because there are also other IN components, depending on the existing situation, whose consideration may improve retrieval results. Or example, these can be time restrictions on information retrieval and reading (the user has only 40 min for this), limitations related to a presentation level (the work is purely philosophical one, whereas a more applied one is desired),

378

ValeryI. Frants et al.

constraints pertaining to the language in which the document is written, etc. In these cases we can talk about time component, level component and language component of IN. Clearly the result of expressing these components in a natural language assumes availability of other retrieval directives.

THE EFFECT OF IN COMPONENTS ON THE IR SYSTEM We will define a set of retrieval directives of different types, with at least one of them being a query, as a retrieval situation. The present paper will discuss only the situations formed by a query and a task. And in this case each new query and each new task forms a new retrieval situation. Note that the same task can be combined with various queries, and the same q u e r y - with various tasks. For example, a task of preparing a review can be combined with the queries: "study of IN", "Drugs for cold", etc., and the query "Drugs for cold"--with the tasks: "Making up a medicine chest for travel", "Development of recommendations for patients", etc. Let us explain why do the latter two tasks differ in essence from the standpoint of carrying out information retrieval. The point is that the first task suggests retrieval of information on as little as two to three drugs to be optimal in a certain sense for travelers, say, having a compact watertight non-breakable packaging, fast-acting ones, etc. As for the second task, it suggests finding information on all existing drugs for cold in order to select the most suitable one for each patient. Next, we will show some features of how a task affects both the retrieval process organization and the realization of its individual subprocesses. Let us consider two retrieval situations assuming that one of them is stated by the query "Waste-water treatment" and the task "Development of new improved treatment methods", and the second--by the same query and the task "Review preparation". Clearly the first retrieval situation assumes organizing the iterative retrieval process, but the second one can generally be restricted to a single retrospective retrieval. So, the example illustrates rather clearly how a task affects the organization of a retrieval process. Another illustrative example is given by two retrieval situations, one is formed by a query "Development of extra-accurate watch mechanisms" and a task "Creation of new mechanisms", whereas the second--by the same query and a task "Examining an idea for novelty". The first retrieval situation, as is a similar one in the example above, assumes organization of an iterative retrieval process. As for the second situation, there it is sufficient, for example, to find a document where the proposed idea has been already presented, i.e. after finding such a document retrieval process may be terminated. We will next show how a task affects the realization of retrieval subprocesses as well. In this case, as can be seen from the following, the differences involved in the realization of subprocesses are derived from the differences in requirements of those or other tasks for retrieval results. Refer again to the retrieval situation where one is formed by the query "Drugs for cold" and the task "Making up a medicine chest for a travel", and the second--by the same query and the task "Development of recommendations for patients". As was mentioned above, the task of the first retrieval situation "necessitates" the finding of two to three drugs for cold optimal in a certain sense during travel, whereas the task of the second retrieval situation--all existing drugs for cold and perhaps even drugs of more general designation. To meet "the requirements" of the task "Making up a medicine chest for travel", it is first necessary to identify those indications of drugs for cold which define usefulness of these drugs during travel, and secondly to construct a query formulation so that it can be determined during the retrieval process whether (or not) the indications mentioned have the desired values. As for the task "Development of recommendations for patients", consideration of its "requirements" leads to solving other problems, namely: constructing a query formulation so that it will be possible to determine during retrieval whether (or not) an illness such as a cold is found among the indications of direct or prophylactic effect of the drug under consideration. The example above demonstrates clearly how a task affects the realization of such a subprocess of the retrieval process as a construction of query formulation. It should be noted that we are speaking of a subprocess of constructing a query formulation rather than a subprocess of

Developmentof IR systems

379

translating a query into IRL, because with the new understanding of IN as well as of its representation in a natural language, the subprocess under consideration is not only a translation of a query into IRL, but it is reduced to a presentation of the information about user's information need in IRL that is contained in both a query and a task. It is obvious that for such a subprocess a system's IRL should allow indexing both a query (this is always the case) and a task (this occurs seldom). Clearly it is possible that not all information about IN that is contained in a query and a task will be represented in IRL. First of all this depends on the quality of the system's IRL and on the methods of constructing query formulations. Note, that it is natural to call the result of expressing a query and a task in IRL as query formulation. This, by the way, agrees with the accepted understanding of query formulation although the components other than thematic are rarely used. It is because of this fact that we call the subprocess under consideration a subprocess of constructing a query formulation. It seems useful to consider a further example of a task influence on the realization of subprocesses of a retrieval process. With this aim we will refer again to retrieval situations of which one is formulated by the query "Waste-water treatment" and by the task "Creation of new improved treatment methods", and the other--by the same query and by the task "Preparation of review". It may be stated that usually in selection of documents for review, thematic boundaries of queries are essentially treated in a wider manner than in the selection of documents with the aim of developing new treatment methods. This can be taken into account in different ways. For example, in construction of a query formulation within the scope of the second retrieval situation, we can additionally use a "broader" glossary than in construction of a query formulation within the scope of the first retrieval situation. Alternatively, we can apply a tougher criterion on output within the scope of the first retrieval situation, than within the scope of the second one, etc. It is obvious, however, that, regardless of the method selected, the realization of subprocesses of the retrieval process in given retrieval situations will be different. This confirms the existence of task influence on the realization of subprocesses of the retrieval process. So, we have considered a number of examples showing that the task effects both the retrieval process organization and the realization of its individual subprocesses. And nevertheless, as noted above, in interaction with a user for the purpose of obtaining information on his IN attention preference is usually given to expressing a thematic component of this need rather than some other (complementary) ones. In particular, special procedures allowing to express a thematic component such as formulating a list of key words "revealing" the user's IN (carried out by a user), user-prepared list of document titles or abstracts conforming to his IN, etc. are under development, which unfortunately is not done in order to express a goal component. "Unfortunately", because we have already seen what an important role taking account of a goal IN component could play in forming required retrieval results. It seems it can be stated that the more accurately a goal component is expressed, the better the retrieval result could be expected. Hence, it is necessary to give due attention to the development of special procedures enabling one to express a goal component in the most accurate way. We believe a good step in this direction is an approach to expressing IN that assumes the task formulation as an independent retrieval directive. The very approach under consideration stimulates the user to recognize the goal component of his IN as clearly as possible. Furthermore, a separate formulation of a query and a task by a user perhaps would be simpler and more convenient for him than an attempt at expressing both goal and thematic IN components in a single text. In this case, development of special procedures allowing to express a goal IN component becomes more purposeful. For example, in the case considered, rather a natural procedure would be in the examination of the list of "typical" tasks by the user. Then the user easily "sees" his task or, using this list, he will be able to formulate the goal component clearly enough. We will dwell on the concept of typical tasks in more detail, We consider "typical" the tasks that are found rather frequently during information retrieval and can form retrieval situations with a great number of various queries. To the typical tasks may be rightfully attributed, say, a task "Development of new methods, devices, approaches, machines, etc." or a task "Review preparation". As for such a task as "Making up a medicine

380

Valery I. Frants et al.

chest for travel", from our standpoint this task in not typical. The principal cause of this phenomenon is the too specific nature of this task or, as we will say from here on, its not very high "generalization level". This level ranks below the "generalization level" of such tasks as "Development of new m e t h o d s . . . " and "Review preparation". It is clear that while identifying a specific task we can determine requirements of this task for organization of the retrieval process and its results. It is reasonable to include into the typical task list not only the task itself but also the requirements dictated by this typical task. This will help a user to understand what is concealed behind the formulation of a particular task and it will allow him to obtain a more accurate notion of it. It should be noted that in the perception of some users a typical task may be related to requirements differing from those in a typical task list. For example, initially the task "Development of new m e t h o d s . . . " would have been related to the following natural requirement for retrieval results: output obtained should be as "close" as possible to an "ideal" one, i.e. only containing all relevant documents available in the collection of documents. At the same time in the user's view the task mentioned may be related to the requirement of obtaining output containing no more than three documents, but only relevant ones. How should one respond in such a case? It is quite obvious that for a particular user the retrieval should be carried out on the basis of his (the user's) understanding of a task, i.e. using those requirements which he designated in a typical task list and/or formulated as ones resulting from the given task, but only if realization of such requirements is feasible. Say, a requirement for obtaining output containing no less than 150 relevant documents may be impossible to meet, in principle, due to the lack of such a quantity of relevant documents even in a relatively large collection of documents. It seems reasonable to include additional "feasible" requirements in the list of available typical tasks since it can provide other users with better insight into what is concealed behind the formulation of a corresponding task. In this case, some requirements may be mutually exclusive, as for example the requirements for retrieval results considered above in the case when a collection of documents contains, say, 20 relevant documents. This, however, should not create problems since a user defining requirements in a typical task list that result, in his opinion, from his task will be able to select the most suitable one. Note that practical experience shows when considering if specific requirements follow from a given task some requirements are agreed on by the majority of users and some requirements are chosen only by a small group of users. Hence, in practice when using list of typical tasks it is possible for each of them to determine (and isolate) a corresponding set (or sets) of requirements whichare most often pointed out by users. The set (or sets) involved will be called a typical configuration (configurations) of requirements. From our standpoint it is reasonable to define, in connection with each task two variations of typical configurations of requirements, namely: a typical configuration of requirements for organization of the retrieval process and a typical configuration of requirements for its results. It is then possible to identify (e.g. with a special mark) these configurations of requirements in a list of typical tasks since such information may be helpful for a user. Next we point out that the development of typical configurations of requirements assumes both development of new and updating of existing methods of information retrieval. For example, we recall from the requirements pertaining to the task "Review preparation" that for this task it may be advantageous to use a "broader" glossary in the query formulation than in the case of the task "Development of new m e t h o d s . . . " . This in turn may lead to the differences in methods of constructing query formulations within limits of one system. This example illustrates, in essence, how requirements effect the IR system construction. It is also a fact that the problem of defining typical configurations of requirements affects the problem of evaluating functional efficiency of information retrieval. For example, if the task "Development of new methods . . . " is related to the requirement: obtained output should contain no more than three, but only relevant documents, in this case the evaluation of functional efficiency achieved is sufficiently "transparent" and requires the construction of a simple mathematical apparatus. But if this task is related to the requirement: obtained output should be as "close" as possible to the "ideal" one, in this case the evaluation of functional efficiency achieved requires the development of a rather sophisticated apparatus capable of determining a

Developmentof IR systems

381

degree of "closeness" of different outputs. So, once more we call attention to the fact that with regard to the availability of tasks and requirements in the IR system we are, in essence, talking about a new element of a system structure--an element which provides a more comprehensive satisfaction of IN through consideration of its complementary components. It is quite obvious that the proposed method of incorporation of this element, i.e. preparation of a list of tasks and requirements for them, can be easily realized and does not entail any considerable costs. It is also obvious that in various cases the quality of realization may be different and eventually will be determined by the improvement in the user's service. Because of this, in the future, it may be reasonable to create a type of procedure for the development of the proposed system element. However, the development of this system element constitutes only a part of the work. A question arises: how to practically accomplish interaction of this system element with other system elements and especially how to automate this interaction? Of course, specific methods of consideration of possible requirements will have to be developed. Nevertheless, we present as an example one of the methods developed which is suitable for two tasks mentioned above, namely: "Development of new m e t h o d s . . . " and "Review preparation". As pointed out above, the first task provides for finding documents containing new ideas (approaches) contributing directly to "development of new . . . " , while in the second task it is required primarily to find those documents which contain descriptions of the most known and promising methods among existing ones. We will start from the fact that in the first case tougher requirements should be imposed upon relevancy of found documents (the requirements for the first task), whereas in the second case--the requirements could be much less stringent (the requirement for the second task). It should be noted that specific methods depend, to a large degree, on the approaches adopted in a system for organizing other system elements. In the given case, when solving a formulated task, we will consider the most typical existing IR systems, i.e. the systems that use Boolean search. Moreover, we will only consider automatic methods of taking the above requirements into account. Besides, we believe it is reasonable to take into account the requirements given in the example at the stage of constructing a query formulation, i.e. to form different (required) outputs by different query formulations. All above-listed conditions can be taken into account in those IR systems which use the algorithm of automatic construction of query formulations described by Frants and Shapiro (1991). The given algorithm provides a very important possibility for our situation, namely: this algorithm is suitable for constructing of query formulations capable of finding a given number of documents, and, the found documents will be the best (according to the chosen criteria) for the user. This feature is provided by the selection algorithm for query formulation, which is a part of the algorithm of constructing query formulations. Returning to those requirements which we are going to take into account, we note that the requirement for the first task provides for a more "narrow" search, i.e. in this case the output should be relatively small. As for the requirements for the second task, it provides for a "broader" search, i.e. output resulting from such a search should be considerable. So, both requirements may contain either a user-specified desired number of documents in the output, or, in the case when this number is not specified by a user, some pre-determined number specified by the system's developers as a standard parameter. Such a number for the requirement to the first task may be 10, and to the second one--100. The existing numbers are specified for automatic zone selection algorithm following which the algorithm of automatic construction of query formulations will generate two query formulations: one will form output consisting of the 10 best documents and the other--100. Note especially that the requirements in the described case are automatically taken into account.

FUTURE RESEARCH The direction for improving the quality of satisfaction of IN by means of the IR system represents a wide area of future studies. This section will briefly list only basic problems, whose

382

Valery I. Frants et al.

solution will contribute to the development of improved IR systems. First we note that it will be necessary to continue the study of IN components, their properties and characteristics. This is clear because more profound insight into IN will not only introduce new elements into the IR system structure, but will also result in a change of meaning of some presently known approaches used in the systems. Of interest are developments in the field of creating lists (tables) of tasks and requirements. It is obvious that for achieving better results, their creation should be based on sufficiently effective procedures or algorithms. Undoubtedly, the development of these means will contribute to success of the approach discussed in this paper. Another important problem is the development of methods (primarily, automatic) for taking into account the effect of complementary IN components on the results of information retrieval. The optimal solution seems to be the development of an algorithm which takes into account each requirement on the list. Obviously, this is yet another direction in the development of the design of IR systems. It should be noted that in the present paper we have not explicitly deal with a very promising direction, the development of user-friendly interfaces. However, both orientation of this paper toward providing help for a user in his efforts to express his IN, and the cited methods of taking into account the complementary IN components by the system are undoubtedly steps along this path. Morover, the methods mentioned could be an important element of future interfaces or at least could considerably stimulate their development. Because of this, future studies directed toward using complementary IN components in the process of user-system dialogue (feedback process), in addition to their own value, are also important in the development of user-friendly interfaces.

CONCLUSION This paper mainly deals with the question of improving the quality of satisfaction of information need (IN) through the use of such form of information service as an IR system. First of all, it was shown that the more comprehensively IN properties and characteristics are taken into account, the better IN is satisfied. In this regard, the paper refines the notion of IN itself, and then investigates IN complementary properties and characteristics. As a result, a mental state such as IN could be viewed as multidimensional because now it is considered as consisting of several components, i.e. not only in a thematic "plane" that the user is interested in (thematic IN component), but also in other "planes", e.g. in a plane of user's goals (goal IN component). Thus, on the basis of the newly developed structure of IN, the paper deals with approaches to the development of improved IR systems, i.e. systems which would incorporate a mechanism for taking into account various IN components. For the reasons mentioned in the paper special attention was given to taking into account a goal IN component and its effect on the information retrieval process. A new tool (instrument) was proposed for taking this component into account, namely: a table of tasks and requirements. Realization of this instrument gives rise to a whole range of new solutions which are considered in the paper in detail. This realization clearly illustrates, among other things, the dependence of IR system construction and design on IN properties and characteristics, as well as points out the potential of this direction of research on developing more advanced IR system and improving quality of user service. It should be especially emphasized that the proposed methods and approaches are suitable for any IR system, and their implementation is relatively simple.

REFERENCES Allen, B. L. (1991). Cognitive research in information science: Implications for design. Annual Review oflnformation Science and Technology, 26, 3-37. Bates, M. J. (1989). The design of browsing and berrypicking techniques for the online search interface. Online Review, 13, 407--424.

Development of IR systems

383

Belkin, N. J. (1987). Discourse analysis of human information interaction for specification of human-computer information interaction. Canadian Journal of lnformation Science, 12, 31-42. Belkin, N. J., & Croft, W. B. (1987). Retrieval techniques. In M. Williams (Ed.), Annual review of information science and technology (pp. 109-145). New York: Elsevier. Buckland, M. K., & Florian, D. (1991). Expertise, task complexity, and artificial intelligence. Journal of the American Society for Information Science, 42, 635-643. Fechner, G. T. (1860). Elemente der psyhophysik. Leipzig. Fidel, R., & Soergel, D. (1983). Factors affecting online bibliographic retrieval: A conceptual framework for research. Journal of the American Society for Information Science, 34, 163-180. Frants, V. I., & Brush, C. B. (1988). The need for information and some aspects of information retrieval system construction. Journal of the American Society for Information Science, 39, 86-9 I. Frants, V. I., & Shapiro, J. (1991). Algorithm for automatic construction of query formulations in Boolean form. Journal of the American Society for Information Science, 42, 16-26. Freud, S. (1991). Ya i ono. Merani, Tbilissi. International Federation of Documentation (FID 478), VINITI, Moscow, 1973. Jahoda, G. (1966). Information needs of science and technology. Background review. In Proceedings of the 1965 Congress FID, Vol. 2. Washington, D. C., Spartan Books. Lancaster, F. W. (1968). Information retrieval systems: Characteristics, testing, evaluation. New York: Wiley. Lancaster. E W. (1979). Information retrieval systems: Characteristics, testing, evaluation. New York: Wiley. O'Connor, J. (1967). Relevance disagreements and nuclear request forms. American documentation. 18(3), 165-177. O'Connor, J. (1968). Some questions concerning "information needs". American Documentation, 19(2), 200-203. Voiskunski, V. G., & Frants, V. I. (1974). Correction of query formulations in documentary information retrieval systems. Nauchno-Teknicheskaya Informatsiya (NTI), Set 2, No. 2, I - 12.

APPENDIX

Vital Activity and Needs Every living thing strives to survive. Man is not an exception to this rule. All his actions and his conscious behavior determine a historically developed and constantly self-renewing striving for selfpreservation, for improving the conditions of his existence. It is customary to assume that prolongation of l i f e - - s u r v i v a l - - i s the goal of man's activity, his life's work. This work itself is determined by needs, i.e. it is man's reaction to his own needs. It is these needs that constantly "prompt" what is concretely necessary for one to do for successful survival. Thus in the course of the whole history of man, his constant concern has been the satisfaction of his own needs. The need for information is among the most important of life's needs. It is information that permits one to successfully adapt oneself to the external conditions of existence--the environmental conditions. The intellect, whose "food" is information, permits man to realize this adaptation, making the environmental conditions not only part of his personal ' T ' but also of the social "We". The latter flows from man's social nature--one of the most important factors in his survival. For him society is not only the environment to which he must adapt. Much of that which is vitally important to man is provided only with the help of society. For example, to satisfy an information need (IN) we use information obtained by others, with the help of means and materials created by others, etc. This is why the striving to survive is often transformed into a striving to preserve society, to protect it, etc. The development of society, its progress, and its economic well-being depend in many respects on the intellectual productivity of the "creative layer" of society. Intensification of such productivity is possible by means of an increase in the quality of the information service for each "producing" intellect. In other words, if we want to receive from each intellect "according to his ability" we must provide him information "according to his need." It is obvious that for high-quality satisfaction of an information need (IN), it is necessary to know and take into account the properties of this extremely important inherent human characteristic (the IN), and in creating any form of information service it is crucial to take into account the properties of the IN. The quality of these forms, for example, the quality of an IR system, will be mostly determined by how completely and how well these forms take account of the properties and characteristics of the IN during their service to the user. Hence, is important to understand the nature of a person's IN and as well as its properties and characteristics. To explain the role of an IN in a person's life, it is beneficial to consider first the person himself. We will consider man as a complex biological system, whose vital activity is carded out by its organically innate functioning algorithms for each of the processes of its subsystems. Moreover, the work of these algorithms in the complex is itself an algorithm of the vital activity satisfying the whole function of the entire system--the function of life. A healthy system is biologically balanced and is in a "comfort" state. The mechanisms of homeostasis as parts of the vital activity algorithms are responsible for preservation of this balance and for maintaining

384

Valery I. Frants et al.

the system in the "comfort" state, i.e. in that state under which the system most easily survives and fulfills its functions. It is obvious that life is not a prerogative of man. He is only an element of a system such as the biosphere. This concept designates all life on our planet. Who would think there could be anything in common between systems such as a virus and a man? However, they are united by one property--they are living. They have algorithms of life: metabolism, reproduction, growth, motion, adaptability, mutability. The last two properties provide survivability of the individual and of the species. In living systems these algorithms are developed in such a way as to guarantee preservation of the species. All life has a cell structure, and it is possible to trace some "rising" expediency of life from the simplest to the most complex, from the elementary (lower) level to a more complex (higher) level, and subordination of simple to complex, lower to higher. Thus the cell is subordinate to the organism, the organism to the species, and the species to the biosphere. Subordinate in the sense that their life and death are justified as long as they are directed to the survival of a higher level. In a certain sense, cells play the same role for the organism as the organism plays for the species, and the species for the biosphere. Algorithms of living creatures are divided into three types depending on the goals which they satisfy: "for oneself', "for the family", and "for the species". They do not always act harmoniously, in various periods of life this or that one prevails. In addition, each element of the system in turn has its algorithms, which in principle can be classified by the same criteria: for a given part (e.g. a cell), for a higher system-an organ, or for a still higher system--the organism. The inherent algorithms can be suppressed in the interests of the higher system. This situation exists, for example, in an organism where cell algorithms have different character, part of them working "for themselves", others for the organism. An organism's regulating systems, for example, suppress the excessive multiplication of cells for which every living system strives. The memory that preserves the algorithms of a species of organisms is deoxyribonucleic acid (DNA). Its molecules are located in the chromosomes of the nucleus of each biological cell. DNA reproduces itself precisely during cell division, which over a number of generations of cells and organisms guarantees the transfer of hereditary traits and specific forms of metabolism. The hereditary traits themselves are encoded in specific parts of the DNA molecules--genes. Each gene is responsible for the formation of some elementary trait. A unique property of genes is their combination of high stability (nonchangeability over a number of generations) with the capacity for inheritable changes--mutations, which are sources of the genetic diversity of organisms and the basis for the action of natural selection. Let us consider a man as a functioning and balanced system. For retaining balance, i.e. the normal course of all life processes, the system seemingly needs only matter and energy. For example, both are required for cell renewal. But these resources are provided by the system's habitat and this habitat is not a nutritious broth. Replenishment of resources always involves certain efforts, certain behavior in the habitat of the system. In other words, to get external supplies, the system should make contact with the habitat, and these contacts often require substantial expenditures of energy. It is obvious that these contacts should be goaloriented, coordinated, rather than sporadic, unsystematic and uncontrolled. It is because of this that the system possesses a nervous system as a subsystem which provides control over a whole system including its every interaction with the habitat. The nervous system determines a system's behavior, the behavior which is oriented toward resource replenishment, and to creation of the best living conditions. It is well known that any control is possible only on the basis of information. Consequently, a nervous system needs information to perform its functions. So, it can be stated that to survive the system requires matter, energy and information. But how does the nervous system perceive information? What prompts the system to do it and how? We consider it below in some detail. The diversity of the external world and the change in our internal states are detected by a system (an organism) with the help of a very large number of receptors, i.e. special biological monitors reacting by an electrical impulse (signal) to specific changes either in the external environment or the internal states of the system. Note that the receptors in man number into the billions. Receptors in a system are divided into two groups: the first is the exteroceptor, translating stimuli perceptible from the outside (from the habitat) and the second is the interceptors, performing the same function within the organism (scanning all subsystems of the organism). The arising signals (nervous stimulations) are transmitted to the central nervous system consisting of the spinal cord and brain. Different parts of the brain are responsible for different functions and different receptors transmit signals to different parts of the brain. Incoming signals from interceptors communicate, in particular, about deficiencies of matter and energy, and those from the exteroceptor about the situation in the habitat. Signals from the interoceptors entering into specific parts of the brain cause stimuli in these parts, i.e. they perturb specific neurons of the brain (brain cells). A characteristic property of neurons is that the electric potentials (signals) developed by them during stimulation are not distinguished by magnitude, the signal equals either zero or its maximal value. This means that neurons "work" with the help of binary

Development of IR systems

385

input--they are either perturbed or not, "1 or 0". Perturbed groups of neurons produce some "pattern" in the brain, i.e. they "concretize" a problem, for example the organism's need for matter. Thus, the place where the "pattern" arises and its form correspond to a specific problem of the system, and indicate exactly what the system needs. The development of the "pattern" activates brain activity, i.e. the psyche begins to "work". This means that the system begins to feel and to comprehend. The purpose of mental activity is the elimination of a problem arising in the system. Therefore mental activity is directed to the search for an algorithm for the system's behavior, behavior contributing to the achievement of the goal. The first step of mental activity is addressing a region of the brain such as the memory. The memory stores knowledge, i.e. "patterns" of those problems which arose earlier, and the system's behavior algorithms corresponding to these problems for specific conditions of the external environment. The latter is extremely important, since elimination of the problem is accomplished in the process of interaction with the environment, and depends on the behavior of the system in a concrete environment. Thus the signals about the external environment, perceivable by other parts of the brain--from the exteroceptor, are absolutely necessary. In those cases when both the problem and the state of the habitat are known to the system, i.e. when analogues of "interoceptor" and "exteroceptor" patterns are in memory, and when the system's memory contains a prepared behavior algorithm (also a pattern), the brain communicates signals to specific subsystems (organs) and this begins the process of a system's conscious behavior (activity). Here a mental process is carried out in a somewhat different form. After the beginning of the activity in this process, control of each stage of activity is carded out (constant feedback which considers changes of the external environment as well as intermediate results of the activity, which can lead to correction of the behavior algorithm) until the problem is eliminated. In those cases where there is no prepared behavior algorithm in memory, the system tries to create one with the help of intellectual activity (one of the varieties of mental activity). This situation signifies that either the need of the organism is for something principally new, i.e. a similar pattern has never before arisen in the system's brain, or the situation in the habitat is unusual and the system has no behavior experience in the new situation. The second case is more significant. What is new in the current state of the habitat cannot be used by behavior algorithyms available in memory. This occurs because the new situations are not foreseen by the algorithms, and an uncertainty arises in the system, i.e. the system does not know how to behave in the new situation. Moreover, the system's attention is focused on this "essentially new" thing, and the system enters into a stable mental state, such as an interest in investigating the new thing, which in essence is one of the manifestations of the information need. Information for the system is everything which decreases uncertainty during the development of a behavior algorithm. An arising need initiates the algorithms and methods of investigating the new thing that are available in the system's memory. In this way, the system begins to act in the direction of investigating the new, in the direction of increasing knowledge. Accumulation of knowledge, i.e. the system's ability to foresee the effect of the new, is carried over from one new thing to another, until the algorithm for achievement of the goal is finally formed. The above-presented process is one of the main processes providing the system's functioning. Notice that survival of the system depends not only on the timely and full supply of matter and energy. In other words, the information about the habitat is needed not only because of the need for matter and energy. Obviously, the habitat was not created expressly for man, and in addition to what he needs, it provides that which he does not need at all. More than that, it "provides" that which immediately threatens the system's life, it "provides" danger. For example, predators, poisonous animals and plants, natural calamities, etc. There are also "acquired" dangers such as crime, intensive traffic, etc. This reason alone would explain why the system constantly needs information about the habitat. It wants to know what can be expected from the environment. Therefore, we talk about the historically formed constant need for information about the state of the external environment. In the process of satisfaction of this "organic" need, knowledge, for example, about surmounting various dangers is accumulated, and in a number of cases the system lays out standard behavior algorithms worked out as a result of frequent application. For example, when crossing the street we usually look to the left, and making certain of the absence of danger, we begin our motion. This is done essentially "automatically", as if unconsciously, but having turned our head we attentively look to see whether an automobile is on the street. It should be noted that some primordial set of knowledge "produced" by a species is built into the system genetically. This is corroborated by the following experiment. The shadow of a vulture was shown to incubated chicks hatching out of eggs, and the chicks ran around in panic. However, the shadow of a dove did not cause any panic since they "knew" that it does not carry information about danger. We emphasize once more that the system's desire to survive is its final goal, whereas a need for matter, energy and information--is its initial goal. Any human activity is eventually directed to satisfying these IPH 32-3-I

386

Valcry L Frants et al.

needs. We say "eventually" because this activity is not necessarily accomplished directly (in a single-step way), but is done indirectly (in a multi-step way). For example, work in a library should be considered as a labor activity typical of a man--activity directed to the creation of acceptable living conditions, i.e. the conditions which can ensure the continuous secure and sufficient supply of necessities for survival in the future (near and distant). It should be noted that both labor activity and care for the future are typical not only of man. For example, squirrels, gophers and other animals store a supply of food for the winter, and a social nature of labor activity for the future can be seen in ants, bees, etc. The specific algorithms of such activity are also created by species and are built in the system genetically. The present paper considers satisfying IN precisely in the context of labor activity. Thus, even a brief consideration of the basic mechanisms of survival shows rather clearly how insufficiencies of matter and energy on the one hand, and the desire to survive in the habitat on the other, are transformed into an information need. And in fact, the striving to survive is permeated with the need for information, a constant "hunt" for it. It was mentioned above that after the interoceptors send signals to the brain (unconscious information process), conscious information activity immediately begins. In fact, any purposeful actions, any conscious behavior or activity, are possible only as reactions to needs and only on the basis of available information. The more information the system has, the more chances it has to survive. It should be noted that something is information for a system only when the system takes it as information, as a something that eliminates uncertainty from the behavior algorithm. The system itself imparts to this something the property "to be information". Since the product itself of the satisfaction of an information need itself comes from and has meaning only within the framework of a concrete information need for a concrete system. It is possible to speak about the subjective character of the perception of the product, character depending on the total set of pattens in the system's memory and on the concrete needs of the system. It is obvious that information for a given system can be something which will never be information for other systems, and that which is information for other systems may never be information for a given system.