Energy information and data retrieval

Energy information and data retrieval

03~5442/85 S3.00+.00 0 1985 PergamonPres Ltd. Energy Vol. 10,No. 10,~~. 1145-1150. 1985 PrintedinGrrat Britain. ENERGY INFORMATION AND DATA RETRIE...

591KB Sizes 0 Downloads 113 Views

03~5442/85 S3.00+.00 0 1985 PergamonPres Ltd.

Energy Vol. 10,No. 10,~~. 1145-1150. 1985 PrintedinGrrat Britain.

ENERGY

INFORMATION

AND DATA RETRIEVAL

ALI B~LENT CAMBEL, MARINO S. PE~A-TAVERAS School of Engineeringand Applied Science, GeorgeWashingtonUniversity, Washington,DC 20052, U.S.A. and CARL F. OLDSEN University Systems ComputerCenter, Ohio State University, Columbus, OH 43210, U.S.A. (Received 13 December

1984; received for publication

I 1 March 1985)

Abstract-The multidisciplinary nature of energy issues, coupled with the proliferation of publications in diverse journals, makes it difficult to review the literature in traditional ways. Fortunately, modem computer technology and library science bring rapid literature searches within the easy reach of individual researchers. The use of encyclopedic computer databases is discussed with particular reference to the DIALOG system. An appendix on equipment needs is included.

INTRODUCTION

In recent years, there has been a tremendous increase in the number of technical publications that the engineer and scientist must keep up with. A number of years ago, an experiment was conducted, wherein a bright young researcher was given an entire calendar year to do nothing but to study the publications appearing in his own particular area of research. He found that he simply could not keep up with the avalanche of books, papers and reports appearing in his own field, let alone broaden his sphere of expertise so necessary of a researcher. One must also remember that, by the time a piece of research appears in print it is already months, perhaps a year or two old. Other matters exacerbate the situation for modern researchers, and these are listed in no particular priority order: (i) The high cost of library acquisitions and operations have resulted in the curtailment of library acquisitions, so that not all useful publications can be found in any one institutional/organizational 1ibrary;r (ii) The increased cost of library operations has resulted in a shortening of library hours, making it particularly inconvenient for the researcher who has to visit libraries outside his/her own grounds; (iii) Much of the useful research first appears in informal nonrefereed reports, which as a rule are sent to only a select list of persons. Certain fields are multidisciplinary and, hence, pertinent publications may be found in a wide variety of journals or in different collections. For example, in the field of energy, it is not surprising to transcend the traditional disciplinary lines, and deal not only with engineering, but also the economic, social, institutional, and political aspects. Therefore, it is apparent that any paper might appear in more than one obvious journal or other type of publication, resulting in useless expenditures of time if traditional library searches are conducted. Fortunately, modern computer and library science make it possible to conduct a reasonably rigorous literature review within a matter of a few hours, if not minutes. This is done because many publication reviews such as Chemical Abstracts have been put in computerized formats. This procedure does not solve the problem of the researcher dealing with a multidisciplinary subject. However, a number of encyclopedic databases have been developed, such as BRS, DIALOG and ORBIT, which contain a wide spectrum of files each of which cover thousands, and at times, millions of different publications, which in database vernacular are called records. While the use of databases is generally similar, we refer especially to the DIALOG Information System. The oldest among the various databases, DIALOG, was founded in 1972 by the Lockheed Missiles & Space Company. Today, it offers some 200 files with about 60 X IO6 records. DIALOG differs from the information utilities such as THE SOURCE or COMPUSERVE because one communicates with a computerized publications catalog, rather than obtaining information or sending messages to another person as can be done with MCI Mail. 1145

1146

A.B.

CAMBEL

et al.

Today, many institutional libraries provide such services and they also have one or more professional searchers on their staffs who will perform a machine literature search on a prescribed subject. While these are useful, there are some distinct advantages for the researcher to conduct his/her own machine search, in that doing so provides greater flexibility. For example: (i) We can do our searches from our own desk, even at home, and can do so at any time of our choosing; (ii) Conducting our own search enables us to more quickly decide the best direction to follow; (iii) It is less costly.2 The present discussion constitutes only a brief summary and the interested reader may wish to consult Refs. 3-6 for further details. USING

A DATABASE

The use of the various databases is substantially the same. In all cases, one needs a terminal or a mini- or micro-computer supported by the appropriate telecommunications programs, as well as a modem (see Appendix). The user also must have a password to the particular database. It is, of course, necessary to have a telephone line. In conducting a machine literature search, the first order of business is to develop the search strategy. We consider, for example, that it is desired to study the impact of technological innovation on energy demands and environmental pollution. In conducting the search, one does not key in the statement in its given form but identifies the appropriate keywords or descriptors. This procedure requires proper interpretation of the subject, prior to identifying the keywords. For example, in the context of our search, the word innovation could imply technical aspects of the type of innovation, economic shifts, social, and political issues. Furthermore, different authors use innovation, invention and discovery interchangeably. Hence, qualifications might be necessary. Concerning the word technological, it is frequently used interchangeably with technical, and the latter might have to be included. Concerning energy, there are various forms such as fossil, fissile, and renewable. Environment also must be explored from different viewpoints such as local or global. On the other hand, the various articles and the word study would be superfluous and should be left out from the strategy statement. While developing the search statement, one must also be aware of the keywords or descriptors used in any one file. These can be found in the appropriate Thesaurus. In other words, the spirit of the search must be harmonized with keywords used in the file residing in the database. One can then write a specific search statement with the use of the three Boolean connectives: and, or, and not. (See Ref. 7.) These are depicted in Figs. 1 through 3. We are now ready for the actual machine search. Only DIALOG is considered here.? Completing the search strategy before connecting with DIALOG is important because charges are made to the user the moment connection is made. While the price of the different files offered by DIALOG vary widely, one can assume on the average $1 SO/min. While this may seem to be high, the rapidity with which the host (DIALOG) computers search the file makes the final literature search very cheap. The procedure is then substantially as follows. First, one looks up the directory of packet switching lines (see Appendix) and dials the closest one. When the connect message appears on the screen or is printed by the printer, one presses the enter or return key (CR) twice. The next query is the type of terminal used. One obtains the appropriate symbol from another listing provided by DIALOG. We type this, e.g. Dl, and press (CR), but this time only once. We enter next our password which is generally an alpha-numeric code, and again (CR). At this point there will appear a number of messages that DIALOG wishes to pass on to its users. When these are completed, one should immediately press b for begin, and the number of the file to be searched which is obtained from the regularly updated DIALOG Catalog describing the various files. However, particularly in multidisciplinary cases, it would be difficult to identify which particular files to search. Hence, it is frequently advantageous to ask File 4 11 DIALINDEX to provide this information. This in essence is no different than using the index in the back of a book except that instead of pages one gets the appropriate file numbers. Accordingly, one keys in b 411 and once again (CR). DIALOG will immediately respond with the unit on-line t A password to DIALOG may be obtained by writing to DIALOG Information Services, Inc., 3460 Hillview Avenue, Palo Alto, CA 94304, USA, or telephoning 800-227-1927.

1147

Energyinformationand data retrieval

c)Truthtable

b)Logicdiagram

a)CWI! diagram

Fig. 1.The “and” operationfor two variables:A II B = f(A,

B) = C.

cost of computer time to the second decimal place in dollars and the unit cost of the telecommunications lines. When this is completed, one would key in S for select and the keywords or descriptors of the strategy already completed. DIALOG will then search for those keywords in its files (estimated to be of the order of 200 and containing about 60 x lo6 records) and within seconds lists the applicable files. From this list one opens each desired file by once again keying b, file number, and (CR). DIALOG will now give the unit price for that particular file and signal that it is ready for a message with the symbol “?“. We then key in SS keywords, for superselect, and (CR). DIALOG will then answer with a list that contains the number of publications for each keyword and the number of the set as well as the number of the combinations. As a rule the number of entries alone is not sufficient for the researcher who will want to know about the contents of the various entries. A variety of options are allowed and are listed in Table 1 which is taken from Ref. 8. Record formats are stored in the computer so that one can select data elements in response to particular needs. The Browsing format will display titles and/or indexing terms. This is useful to gain insight as to the topics of the subject matter in question and to note the keywords that were assigned to index it. This provides a quick scanning ability to determine if the search is proceeding in the right direction. Short form formats are useful for ascertaining the bibliographic data if it is desired to locate the physical document. It provides the essential data for constructing a bibliography that includes the author, title, publisher, page number(s), source, and related information. End-user formats provide a copy of the entire record including the previously described formats plus the abstract or narrative description of the article. In lieu of the actual document, this is a useful surrogate that we rely on to decide whether or not to pursue obtaining the actual publication. The full record with abstract if concise and well written, can provide a snap shot of the publication’s contents. The Miscellaneous format provides for listing the accession number, or a numerically sequenced number assigned to each distinct record in the database. Accession number listings are useful for record keeping purposes, shelf arrangement checklists, and to use in comparing against other similar listings. This format can also be used to store unique display needs, such as publisher lists where publications from a specific source are required, or publication date is important. Should one wish to obtain a hard copy of an entire publication, one can so signal DIALOG and obtain a photocopy in about one week. Conversely, one can look for that particular reference in the library of one’s choosing. While the search can be conducted visually on a CRT screen alone, there are disadvantages to doing so and a printer is recommended. The primary reasons are that the inputs from

1

c)Truthtable

a)Clrcwt diagram

Fig. 2. The “or” operation

for two variables: A U B =f(A,

B) = A + 3 = C.

A. B. CAMBEL et al.

1148

ET B

A

4

i

c)Truth We

a)Circuit diagram

Fig. 3. The “and not” operation: (A II I?) =f(A, B) = C.

the database stream by in rapid order even at 300 baud and the searcher will have trouble comprehending them. It also demands excessive concentration and is tiring on the eyes. The advantage of the printer is that at the end of the search, one will have a printed record which can be studied subsequently with greater care and referred back to should the need arise. A specific example might be helpful and is taken here from Ref. 9. Let us search for publications dealing with solar energy. Because DIALOG will be searching objectively for solar and energy separately, and then for their combination, it would appear that the strategy might be simply solar and energy. However, depending on the author’s tastes, it is conceivable that the word sun is used instead of solar. Hence, we would include in our strategy, solar or sun. On the other hand, there are numerous forms of energy such as fossil, nuclear, fusion, etc. These we need to exclude from the search. Hence, we should use the connective “not”. The associated Venn diagrams, reproduced here from Ref. 9, are shown in Figs. 46, where the rectangle denotes the universe of the publications in that particular DIALOG file. It should be noted that the radii of the circles do not need to be drawn to scale depending on the number of records in the file. In Fig. 4 are listed some symbols and alpha-numerics. The question mark denotes that DIALOG queried the searcher for his/her wishes, to which he/she replied “select solar and energy”, and DIALOG indicated that in that particular file there were 150 14 records, i.e. entries, that contained the word solar and 79270 that mentioned the word energy. In turn, there were 5 163 records containing both solar and energy. The combination would be a smaller number of records than either keyword separately. The number 10 in the last line indicates the set number for “solar and energy” that DIALOG assigned to the combination in this particular search. The set number constitutes an identification number, and in any modification or extension of this particular search one can use it instead of solar and energy, and by doing so can reduce the time spent on line. The above explanation is quite similar in the case of the “or” function and this is shown in Fig. 5. In this figure the combined set includes a greater number of records than the larger number in either set. This is due to the fact that some, but not all, of the records used both keywords. In regard to Fig. 6, the number of combined records in the file is again greater than the minimum number, but this time it is due to the fact that energy includes publications using the descriptor nuclear,

Table 1. Record formats commonly used in bibliographic databases.* I

I

Format 6 Format 8

Title only Title and indexing

Short form

Format3 Format 2

Bibliographic citation Full record except abstract

End-user

Format 7 Format 5

Bibliographic citation and abstract Full record (includes abstract, if any) I

I

t

I

Format 1 Fonat 4

Miscellaneous

I

I

I

Browsing

I

Accession number only File dependent

I

I I I

1149

Energy information and data retrieval ?

SELECT SOLAR

ANDENERGY 15014 SOLAR 79270 ENERGY 10 5163 SOLAR ANDENERGY

AN’DENERGY (5,163 records)

SOLAR

Fig. 4. The “and” function for solar and energy.t9

?

SELECT SOLAR OR SUN 15014 SOLAR 1081 SUN 11 15460 SOLAR OR SUN

SUN I records)

-8 I_ t OR SUN ( 15,460 records) SOUR

Fig. 5. The “or” function for solar or sun.t9

?

.

SELECT ENERGY NOT NUCLEAR 79270 ENERGY 53220 NUCLEAR 12 66616 ENERGY NOT NUCLEAR

ENERGY---, (79,270 recwdrl

-3,220

MJCLEAR records)

@&II \ ENERGY NOT NUCLEAR (66.6 I6 records)

Fig. 6. The “and not” function for energy and not nuclear.t9

but there are also other records containing the descriptor nuclear which are not related to energy, such as for example, nuclear medicine. Perhaps a better approach would have been to use the DIALOG command SS, i.e., superselect. In this case, an individual set number would have been assigned to each descriptor and more refined searches could have been conducted combining the set numbers. For details of the many additional search refinements, the reader may wish to consult Refs. 8-10. Acknowledgments-The

authors express their thanks to E. B. Heyward for the production of the manuscript.

t Data Copyright 1985 DIALOG Information Services, Inc. Reproduced with permission.

1150

A. B. CAMBEL

et al.

REFERENCES 1. G. Black, Science, 213, (4509) (August 14, 1981). 2. J. Sandy, Science, 216, (4553) (June 25, 1982). 3. On Line Search Strategies, Edited by Ryan E. Hoover. Knowledge Industry Publications, White Plains, NY ( 1982). 4. A. Glossbrenner, The Complete Handbook of Personal Computer Communications. St. Martin’s Press, New York, NY (1983). 5. M. Lesko, The Computer Data and Database Source Book. Avon Books, New York, NY (1984). 6. D. Stoner, Personal Telecomputing. COMPUTE Publications, Inc., North Carolina (1984). 7. J. E. Whitesitt, Boolean Algebra and its Applications.Addison-Wesley Publishers, Reading, Massachusetts (1961). 8t. System Seminar, DIALOG Information Retrieval Service, Palo Alto, CA (1982). 9t. Guide to DIALOG@Searchina, Lockheed DIALOG Information Retrieval Service. Palo Alto. CA (19821. lOi. DIALOG Lab Workbook,3rdy&., DIALOG Information Retrieval Service, Palo Alto, CA (1$81).‘ ’ APPENDIX The modem In order to conduct a machine literature survey certain equipments are necessary. Of course, primary among these is a terminal or a computer supported by the appropriate telecommunications software program. Because computers are basic to the researchers’ kit of tools, it is not necessary to discuss them. However, our experience indicates that many persons having direct access to mini- or micro-computers do not have available to them a very crucial item, namely, the so-called modem. The word modem derives from a combination of two others, namely the modulator and demodulator. The modem is required because the signals from a terminal or computer are in digital form, whereas the telephone lines used for at least part of the communications leg are designed for the voice communications and, hence, are designed for analog signals. The modulator converts the digital signal into analog signals and the demodulator converts these back to digital signals to be accepted by the host computer, in the present case the database or the telecommunications lines. Modems combine the modulator and demodulator in one unit. Basically, there are two types of modems: internal and external. The modem is connected to the telephone line by either acoustic couplers or electronically. Further, depending on the modem design, dialing may be accomplished either electronically or manually. There is no one best modem and each has certain advantages and disadvantages. In initiating a search, it is necessary to set the communications parameters properly. These are determined by the host computer, the packet switching system, and the sending computer and, hence, some fact finding can be expected. Here the crucial factor is first the baud rate which determines the speed at which the signal is transmitted. In general, 1 baud equals 1 digit or 1 symboI/sec. Specifically, in the binary system 1 baud = 1 bit@. In database searches the two most commonly used baud rates are 300 or 1200 baud. When a computer is used for a search it is not the computer itself that limits the baud rate, but rather the modem, or the telecommunications network. Modems having transmission speeds of considerably higher than 1200 baud are available commercially, but are not used unless there are engineered, i.e. specially designed and installed, communications lines. Probably, the next communications parameter that must be set is the parity bit. Because in computer communications one generally uses the serial port, it is necessary to insure that there be provided a test to insure that the proper signal bits are transmitted. Depending on the particular situation, the parity may be set to odd, even, none, or ignored. The parity status is determined by the word length which is generally 7 or 8. Other parameters to be set are the stop bit which ends any one signal transmission. It may be 1 or 2; the control according to the x parameter which may be on or off; and the control according to the shift input/output sequence may also be on or off. In using DIALOG one does not need to be concerned about these last three. Finally, it should be remembered to set the modem to full duplex because communications with DIALOG are two-way signals. Communicating with DIALOG or any other database through standard telephone lines can be financially prohibitive. Accordingly, one takes advantage of digital networks also called packet or value added networks, which, of course, are not suited to voice communications. Among these the ARPANET is operated for government use and is not available broadly. However, there are commercially available networks, also called value added carriers who charge the user a relatively modest fee. These include: WATS, TELENET, TYMNET and UNINET. (For searches originating outside the USA, DIALNET can be used in certain locales.) In cities with a population over 50,000, one or another of the various networks can be reached by means of a local telephone call. These local numbers to be found in a special directory are called nodes. In general, the telecommunications networks transmit digital signals at the rate of 9600 baud. Therefore, 32 users with 300 baud modems can be accommodated by the carrier at any one time, whereas with 1200 baud modems only eight users can be accommodated at one time. It is for this reason that there is a premium price associated with telecommunications at the higher baud rates. Clearly, the data bank searcher who has a modem that allows for different transmission speeds must evaluate whether or not the time saved is worth the higher communications cost which, of course, is part of the total search cost. In order to communicate with another computer such as a data bank it is not necessary to use a computer, and a dumb terminal will do quite well. In some ways this may be simpler to use because one need not set as many parameters. As a rule, dumb terminals receive their messages in print, whereas computers converted into terminals by means of suitable software programs can use either their printers and/or their CRT monitors. However, while the CRT screen is convenient for scanning the search results, it does not allow for permanent retrieval as does the print out. Some searchers who periodically update the information and/or data gathered transfer these onto a disk for future updating or printing.

i Data Coovright 1985 DIALOG Information Services. Inc. Reproduced with uermission.