Computer network resources and advanced level teaching of biochemical sciences

Computer network resources and advanced level teaching of biochemical sciences

40 Features Section: Computer Aided Learning Editor: R P L e a r m o n t h , University of New England, Australia This issue continues the theme of ...

596KB Sizes 0 Downloads 44 Views

40

Features Section: Computer Aided Learning

Editor: R P L e a r m o n t h , University of New England, Australia This issue continues the theme of electronic publication and information available on computer networks. In addition Drs Tan and Kiong of the National University of Singapore have provided a comprehensive paper showing how electronic communications and resources may be extensively integrated into biochemistry teaching. The paper also includes a useful summary of internet resources relevant to biochemists. We are fortunate in having a steady stream of high quality articles submitted in this section, reflecting the healthy interest in application of computers to teaching and learning.

Publication Directory Available Directory of Electronic Publications, Third edition, Association of Research Libraries ISSN: 1057-1337, US$42 plus postage The third edition of this directory includes information on 1152 scholarly lists and 240 electronic journals, newsletters and related titles including newsletter digests. The directory is available in printed form, and will also be available on disk. For further information contact the Features Editor or send an email message to Ann Okerson at [email protected].

Journal Review Livestock Research for Rural Development. Editors T R Preston and H H Orsorio, Centro para la InvestigaciOn en Sistemas Sostenibles de Producci6n Agropecuaria (CIPAV), Call, Colombia Requirements: IBM compatible, 256 Kbytes memory, one floppy disk drive (5.25 or 3.5 inch), MSDOS v2.l or later, monochrome non-graphics screen or better Livestock Research for Rural Development is an international, refereed scientific journal distributed by electronic means. The journal was started in 1989 to publish the work of young research workers in developing countries. Papers are published in English, French, Portuguese or Spanish, with summaries in English also provided. Journal subscriptions are distributed on floppy disk or via the APC (Association for Progressive Communications) computer networks or the Internet (via anonymous ftp to trans.plants.ox.ac.uk). The journal is distributed free to workers in developing countries and for a modest subscription fee to individuals and organisations in Europe, the USA and other developed countries. BIOCHEMICAL EDUCATION 22(1) 1994

An issue consists of 10 papers, distributed on floppy disk. There are a total of 134 topics to June 93, and in future individual papers will be published on the computer networks immediately they are accepted, thus shortening publication times dramatically. Each issue consists of a series of plain text files, with a program which reads and presents a table of contents and allows selection, viewing and printing of individual articles. Notes to authors and readers and current news items are also provided. One question I had was how graphics could be incorporated in a system with such modest computer hardware requirements. The ingenious solution to this problem has been to include data in tables only, with the suggestion that readers may like to extract tabulated data and plot them using their own graphics program. A further advantage is that the numerical data may also be further processed by statistical analysis. This journal has broken new ground in electronic publication and has been quite successful, now in its fifth year. We can look forward to this model becoming a mainstream form of rapid communication of scientific papers.

Computer Network Resources and Advanced Level Teaching of Biochemical Sciences TIN-WEE TAN l'a and DEREK BENG-KEE KIONG 2'a i Department of Biochemistry, 2 Department of Information Systems and Computer Science, and ~Biocomputing Research and User Support Biocomputing Interest Group National University of Singapore, Kent Ridge Singapore 0511 Introduction The biochemistry curriculum offered by the Department of Biochemistry, National University of Singapore is a two-year course beginning with second year Science students. Outstanding students are permitted to proceed to the fourth year honours course which includes a research project where the student has a hands-on approach to both theoretical and practical aspects of research. Initially, the students carry out a literature review of the topic. Hence, access to up-to-date primary and secondary sources of scientific information and literature is essential. However, being geographically distant from journal publishers, we suffer from the usual problems of delivery. Moreover, as more of biochemical research takes on an informational and computationally

41 analytical character, the need to introduce students to basic software and databases in computational molecular biology is obvious. Previously, the cost of browsing commercial databases, typically through our library facilities, make it difficult to justify all but funded users. The cost of proprietary biocomputing software is also relatively high. In recent years, the trend has been for the ordinary user, particularly the student, to enjoy the benefits of direct access to databases and biocomputing software as a basic necessity. It is therefore important that all available resources on the computer network, including the local area network (LAN) and the wide area networks, are assembled in an organized and integrated fashion for easy access. The immediate goal is to free the user from the complexities of manipulating computer commands in order to concentrate on the science. Over the past three years, we have taken a two-pronged approach to assist our students: firstly to catalogue, acquire and implement network information and software resources, and secondly to develop a simple unified user interface that will access these diverse resources. Local Area Network at NUS Since 1990, computers throughout the National University of Singapore (NUS) campus have been connected via a local area network (LAN) called NUSNET. User terminals are connected to DOS or Macintosh servers or directly to Unix, VAX or IBM mainframes. Users may also dial up toll-free via a modem to the various network servers or mainframes. Both staff and student have access to LAN resources. In particular, the library on-line catalogue system, NUSLINC, is available on NUSNET such that remote users may log into a captive public NUSLINC account from their NUSNET account. Gradually, more services such as Science Citation Index and MEDLINE abstracts (Compact Cambridge) were added. The interface is flexible and user-friendly. In the course of the past three years it permitted us, as co-ordinators of the Biocomputing Interest Group (BIG) at NUS, to add additional menu options to the interface that lead to various biocomputing software which our students were taught to use. Software packages taught included the use of a reference database management program, REFERENCE MANAGER (Research Information Systems, Inc), to organize their bibliography downloaded from the MEDLINE abstracts service, and programs that facilitate the production of research reports. Mainly obtained free from the INTERNET public domain archives, PC-based DOS biocomputing software currently supported on NUSNET includes popular programs such as AUTHORIN, CLUSTALV, COMAP, EASYCLONER, ELBAMAP, EZ-FIT, PHYLIP, PRIMER, PROFILEGRAPH, PROSEARCH, READSEQ, SEQAID, SIGSEQ. As much of biochemical research today takes on a molecular biology character, we believe that exposure to such programs will give students the confidence to handle problems of computational molecular biology, BIOCHEMICAL EDUCATION 22(1) 1994

whether on its own or as a research tool. References to these software may be found in the review by Tan t or directly from the documentation accompanying these software downloaded from various sources (see below under mail servers and anonymous FTP). Global Networking on the INTERNET From 1990 to 1991, using our BITNET connection to network bulletin boards, particularly the BIOSCI/BIONET newsgroups, 2 and from 1991 onwards, using the INTERNET connection, information on all resources pertaining to bioscience research was accumulated. INTERNET 3 today is a network of more than 7000 computer networks (about 1 million computers) spanning the globe. Each computer may be shared by hundreds of users. Many of these computer hosts belong to institutions of research and tertiary education but a growing number of schools, especially in the United States, are beginning to acquire INTERNET access. As soon as our INTERNET connection was implemented, the shift towards Unix platforms began. Since the Unix operating system is not the most penetrable platform for the novice, considerable effort was spent on providing a user-friendly Unix shell script interface, BIOMENU, and later on, a hypertext-like C program menu system, HYBIOMENU. 4'5 These programs unified the diverse modes of accessing network resources, as briefly described below, for the use of our students. Conventional Network Information Retrieval There are many conventional methods of access to information and software on the network. 3 The user may log in to a remote computer (using telnet), send electronic mail to colleagues or read bulletin board messages and engage in discussions, and transfer document and program files across global networks (anonymous File Transfer Protocol-FTP).

Remote Login Often, managers at INTERNET sites are kind enough to provide guest access, typically captive accounts which provide information retrieval services only. An example of this is the Genome Data Base (GDB) hosted at the Johns Hopkins University School of Medicine in Baltimore, for the search and retrieval of gene maps and associated data from the On-line Mendelian Inheritance in Man (OMIM) database. 6 For browsing through library catalogs, more than 400 libraries throughout the world offer free logins to their online systems (see for instance HYTELNET by Peter Scott7).

Network Newsgroups and Bulletin Boards Computer network news is another useful information resource which models conventional notice boards. In the case of network newsgroups, postings are divided into logical groups (currently, more than 2000) and are broadcast to subscribing computer hosts over INTERNET. For the biological sciences there are more than 20 groups for discussing topics including biochemical and

42 molecular biology techniques, biocomputing software, use of network resources, as well as specialist discussion groups on topics including ageing, agroforestry, Arabidopsis research, immunology, molecular evolution, neurosciences, plant research, population biology, protein crystallogrpahy, virology and so on. These come under the B I O S C I / B I O N E T network newsgroups 2 to which hundreds of bioscience researchers subscribe. Therefore, if a problem is announced, it is likely that somebody out there has experience with it and probably the solution.

Electronic Mail While news articles are broadcast to all subscribers and virtually the whole I N T E R N E T world can read them, electronic mail is more private and is analogous to ordinary postal letters. The speed of transmission makes this an invaluable and cheap medium to exchange ideas with fellow researchers informally. Email also provides an alternative mechanism for subscribing to the bulletin boards mentioned above. 2 Electronic Mail Servers Instead of replying by electronic mail individually, the information provider might automate the procedures for interpreting the request and sending the desired information to the requester. This automatic procedure is known as a mail server. Essentially, requests for information are sent via electronic mail to a non-user account. One useful example would be the free service provided by the National Center for Biotechnology Information (NCB) to help researchers search the databanks for homologous D N A or protein sequence similar to a query sequence using the B L A S T programs of Altschul and others. 8 A European example would be the E M B L file server service. 9 These services are the more popular among our staff and students. Available from the authors by email or by post is a table of useful mailservers on B I T N E T and I N T E R N E T , which include data from M Gribskov (1991), Amos Bairoch and Una Smith, 1° from their postings to the BIOSCI newsgroups. The table contains nearly two dozen server sites at research and academic institutions, the corresponding network email address, and the services, resources and databases offered. Anonymous FTP Often, I N T E R N E T site managers are also generous in providing space for archives. These are repositories of electronic data ranging from public domain software, technical reports to instruction manuals and other commonly requested documents. A table of more than two dozen FTP sites useful to biologists is available from the authors by email or by post. Together, these sites offer more than 30 different biological databases and hundreds of useful software. The A R C H I E program 3 can be used to identify the computer site holding the desired program. In the context of the biosciences, it is possible to download a wide range of databases including the frequent updates from Genbank Il and E M B L 12 (but not for large databases because of the huge demand on network traffic). BIOCHEMICAL EDUCATION 22(1) 1994

Recent Information Retrieval Systems Some of the above methods are fairly involved in terms of computer proficiency, which makes usage limited to the highly motivated novice or the computer professional. In the last two years, new information retrieval systems have emerged, offering a simpler, more intuitive approach. These software include the Wide Area Information Servers (WAIS - - pronounced 'ways'), 3 G O P H E R information retrieval system 3 and World Wide Web (WWW) 3 distributed hypertext project, as described below, all of which are freely available. WAIS distributed database search Most databases reside on separate programs, platforms and machines. The WAIS system has an interesting feature that allows several databases to be queried at one time. The retrieved information is collected, collated and presented to the user. It also provides relevance feedback methods of information search. Within a year of its introduction, hundreds of WAIS servers throughout the world offer a wide range of information, including many related to the biosciences, most of them free. We have, for instance, indexed the P R O S I T E protein sequence motifs database of Bairoch 13 and the G E N P E P T database (translated G E N B A N K ) , and have provided the data as WAIS sources. A table is available from the authors which describes more than 50 biological databases currently accessible as WAIS sources. They range from major molecular biology databases such as GenBank and E M B L to biochemical databases such as E N Z Y M E .

GOPHER

distributed

directory

browser

system

G O P H E R is a new network tool that permits the user to browse through resources located on different systems on the network simply by choosing items from a menu. The system is very intuitive, and is compatible with WAIS databases, which can be accessed from a G O P H E R menu. Almost all the essential biocomputing resources have been 'gopherised' in the past six months. On our part, we have set up a G O P H E R server to provide links to our own locally maintained biodatabases as well as to those overseas. The authors may be contacted for a list of several scores of biological G O P H E R sites, offering thousands of information items and search-and-retrieval of dozens of biological databases (see ref 14 for descriptions of these databases).

W W W distributed hypertext/hypermedia The WorldWide Web (WWW), a project initiated by the European Laboratory for Particle Physics (CERN), is a global computer networked, hypertext/hypermedia system which accesses documents on remote locations, such that to the user, it appears as if the information were part of a contiguous piece of text. As this system was only introduced very recently, we have only set up the software and implemented a WWW client system. The amount of biological data on WWW is increasing rapidly. For a more detailed account of the modes of access and

43 the network resources, the reader is directed to the excellent guide book by Krol, 3 to the review by Tan, I the Internet document by Smith 1° and to the online contextsensitive help files available on BIOMENU and HYBIOMENU on our Unix mainframe computers. Interested readers are invited to contact the authors for a temporary guest account ([email protected] or [email protected], sg).

platforms in a seamlessly transparent fashion. This contributed to confusion, particularly among computer-naive students. In dealing with this problem, we have designed a series of menu systems that permitted users to traverse their initial login DOS server to the Unix mainframes. Currently work is in progress to develop a windows/ mouse-operatable program that will unify the DOS and Unix interface.

The Biocomputing Course Module for Biochemistry Honours students: three years of experience For the past three years, our biochemistry honours students have participated in an evolving biocomputing course module. Initially, students were merely taught to use various public domain molecular biology programs retrieved from the network on a stand-alone piecemeal fashion, exclusively on the PC-DOS platform. However, the low level of computer network literacy among our students and the bewildering range of software and their equally diverse interfaces indicated that a user-friendly menu interface would help students tremendously. In the second year, a simple interface was set up to assist students. Again, this was confined to the PC-DOS platform with a brief introduction and demonstration of the INTERNET facilities on the Unix platform. In the third implementation of the biocomputing module, a more organized approach was adopted with the use of menu interfaces on both the DOS and Unix platforms. Scheduled at the beginning of the honours course, the current biocomputing module has two one-week parts. These were held at our computer laboratories where each student is given a user account and assigned to a networked PC. Teaching methods included lecturedemonstrations using an overhead LCD projection plate and colour television graphics projection system, tutorial discussions, supervised practice on the PC and problem solving. The first component on information retrieval dealt with the access to the various database resources mentioned above. It must be stressed that these resources are not restricted to privileged and/or funded users but are open to anyone with an account. Currently, all staff and students are eligible to register for accounts on all computer systems managed by the NUS Computer Centre. The second component dealt with the use of a core of databases and computer programs for computational molecular biology. The theory of selected programs was briefly dealt with in a non-mathematical approach. The main aim was to provide hands-on practice and to equip students with the tools to cope with the demands of their research project later in the academic year and possibly in the future. Not all computer tools were covered in depth as it was the purpose not to make students experts in handling the programs but to assure them of the availability of the tools. One of the problems identified in this third course was that the user could not traverse the DOS and Unix dual

The Future One objective in our course is to ensure a basic awareness of biological information retrieval systems and biocomputing software. This is because biological research today utilizes computer tools as part of an indispensable range of laboratory techniques. To that end, the biocomputing course module is slowly evolving a syllabus that will put our graduates on par with those from other biochemistry departments where computer and network literacy is concerned. Moreover, it is anticipated that the currently limited repertoire of on-line bioscience journals/books will soon increase in number and decrease in cost. The hardware, such as graphics high resolution terminals, essential for supporting such services and three-dimensional molecular modelling should begin to proliferate. Information technology, therefore, has begun to take root in our teaching of biological sciences.

BIOCHEMICAL EDUCATION 22(1) 1994

Acknowledgements The authors acknowledge the support of the staff of the Computer Centre, National University of Singapore and the encouragement of colleagues in establishing a campus network of biocomputing resources. We thank Fredj Tekaia of the Services d'Informatique Scientifique (SIS), Institut Pasteur, Paris, for testing the HYBIOMENU interface on his users. This project is run by the volunteer Biocomputing Research and User Support (BRUS) team comprising the authors (Internet email address: [email protected] or [email protected]). Parts of this paper have been presented at the Seventh World Congress for Medical Informatics MEDINFO'92, Geneva, September 1992 and at the in-house seminar, Teaching in Science, Faculty of Science, NUS, Singapore, November 1992.

References fTan, T W (1992) in 'MEDINFO'92. Proceedings of the Seventh World Congress on Medical Informatics, Geneva' (edited by Lun, K C, et al), pp 66-71, North-Holland, Amsterdam 2Bleasby, A J, Griffiths, P, Hines, D P, Marshall, S E and Stanniford, L (1992) Binary 4, 162-163 3Krol, E (1992) The Whole lnternet: User's Guide and Catalog. O'Reilly & Associates, Inc, Sebastopol 4Kiong, B K and Tan, T W (1993) Comput Applic Biosci 9, 211-214 ~Kiong, B K and Tan, T W (1993) Comput Applic Biosci 9, 581-586 6McKusick, V A (1990) Mendelian inheritance in man, 9th edition. Johns Hopkins University Press, Baltimore 7Scott, P (1992) HYTELNET version 6.4. Published electronically on INTERNET, Freely available by anonymous FTP from ftp. usask.ca (IP. 128.233.3.11) in the directory pub/hytelnet ~Altschul, S F, Gish, W, Miller, W, Myers, E W and Lipman, D J (1990) J Mol Biol 215, 403-410 ~Stoehr, P J and Omond, R A (1989) Nucl Acids Res 17, 6763-6764 mSmith, U R (1993) A biologist's guide to the internet. Usenet news.answers Available via anonymous ftp from rtfm,mit.edu in pub/ usenet/news,answers/biology/guide. l iBurks, C, Cassidy, M, Cinkosky, M J, Cumella, K E, Gilna, P,

44 Hayden, J E-D, Keen, G M, Kelley, T A, Kelly, M, Kristofferson, D and Ryals, J (1991) Nucl Acids Res 19, 2221-2225 ~2Stoehr, P J and Cameron, G N (1991) Nucl Acids Res 19, 2227-2230 13Bairoch, A (1991) Nucl Acids Res 19, 2241-2245 14Kamel, N N (1992) Comp Applic Biosci 8, 311-321

Note added in proof: We now provide network biological information services on WAIS, Gopher (biomed.nus.sg), WWW (http://biomed.nus.sg:80/) and an emailserver ([email protected]) which are freely available.

Appendix 1 List of biocomputing software which we have implemented on our computer network. Many of these are available in public domain archives on INTERNET. Software introduced or taught to students are marked with an asterisk. References to these software may be found in the review by Tan (1992).'~ The authors collaborate with several institutions including Institut Pasteur, Paris; World Data Center for Micro-organisms (WDC), RIKEN, Japan; Fred Hutchinson Cancer Research Center, Seattle and others to develop and provide comprehensive biocomputing and bioinformation resources for biologists and routinely set up such resources by remote Iogin to various Unix computer sites throughout the INTERNET. Interested parties may contact the authors by email (see above)

Description/Functionof the software Platform General Sequence Analysis Software

Name of software

DOS Unix

SEQAID* GCG Genetics Computing Group package* EmailServerBLAST*,FASTA* DOS AUTHORIN*

Homology search of database GenBank Sequence submission program Multiple sequence alignment DOS/Unix Design of PCR primers DOS/Unix Restriction map construction DOS Simulation of gene cloning DOS Identification of signal peptide DOS Protein sequence analysis DOS Protein motifs identification DOS/Unix Sequence format interconversion DOS/Unix Enzyme kinetic analysis DOS Phylogenetic linkage analysis DOS/Unix Bibliographic database management DOS Gene modeler Unix Identification of transcriptional Unix factors sites Electrophoresis band mapping DOS

CLUSTALV* PRIMER* COMAP* EASYCLONER* SIGSEQ* PROFILEGRAPH* PROSEARCH* READSEO* EZ-FIT* PHYLIP REFERENCE MANAGER* GM TFDsearch, SignalScan ELBAMAP

Glycobiology: Background and Development b y J A C a b e z a s . p p 82. U n i v e r s i t y o f S a l a m a n c a S p a i n . 1993. This is the text (in English) of a lecture delivered on the occasion of the inauguration of the academic year in Salamanca. It traces the history of glycobiology from the biochemistry of mucins over 150 years ago, to our present knowledge of glycoproteins, glycolipids and glycosaminoglycans, glycoconjugates and glycobiotechnology, as well as their nomenclature. The text is expansive in quotation and historical background, and in addition has a n u m b e r of Tables giving references to reviews published since 1957 and 1964, respectively. It develops the context of a sentence of N Sharon, who claimed (as early as 1975) that "the specificity of many natural polymers is written in terms of sugar residues, not of amino acids or nucleotides". G W Hart (1990) in Glycobiology (vol 1, p 1) stated:

Glycobiological research is among the last 'great frontiers' of biochemistry and remains at the crux of understanding many aspects of both cellular and metazoan biology. Copies of this lecture, in English or in Spanish, are available from the author, Professor J A Cabezas, D e p a r t m e n t of Biochemistry and Molecular Biology, Avenida del C a m p o Charro, s/n, 37007 Salamanca, Spain: fax (23)294513. A contribution of $12 towards the cost of the photocopying and mailing would be appreciated. E J Wood

BIOCHEMICAL EDUCATION 22(1) 1994

Letter to the Editor From S K Singla Forumlae for A TP yield f r o m [3-oxidation o f fatty acids D e a r Sir The formulae for calculating energy yield on oxidation of fatty acids given by A k u n e k w e t are interesting but the one for fatty acids having odd numbers of carbon atoms is inaccurate as the following calculation will show: Consider a fatty acid with 17 carbon atoms, A T P produced as per the given formula = (17 × 17 - 51)/2 + 6 - 3 = 122 which is lower than the A T P yield from a fatty acid with 16 carbon atoms (129). The apparent cause for this discrepancy is the wrong presumption that propionyI-CoA oxidation yields only 6ATP. Propionyl-CoA is a three carbon acid and is expected to yield around 2 4 A T P since the average A T P yield/carbon of a fatty acid ---8 (129/16). This argument gets further strength on considering the A T P yield from some three carbon acids. Pyruvate which has an oxidized a-carbon atom yields 15 A T P (12 in T C A cycle + 3 from N A D H produced in the oxidative decarboxylation step). Lactate with has a-carbon at oxidation level in between the pyruvate and the propionate yields 18ATP. Coming to the crux of the problem, 6 A T P yield considered by the author are accounted for in the oxidation of propionyl-CoA to oxaloacetate through succinyI-CoA. For net oxidation of oxaloacetate to CO2 and H20, it is first converted to phosphoenol pyruvate which, in turn, is converted into pyruvate as shown below: Oxaloacetate + G T P = Phosphoenolpyruvate + G D P Phosphoenolpyruvate + A D P = Pyruvate + A T P T h e r e is no gain or loss of A T P equivalents in the process. T h e r e f o r e , p r o p i o n y l - C o A oxidation is expected to yield 2 1 A T P and not merely 6ATP. This A T P yield from the oxidation of p r o p i o n y l - C o A is reasonable when compared to the energy yield from oxidation of pyruvate and lactate. The total A T P produced from oxidation of fatty acid with 17 carbon atoms = (17 × 17 51)/2 + 21 - 3 = 137. This yield agrees well with the expected A T P yield of ---8 ATP/carbon atom of the fatty acid. I like to simplify the formula further. For A T P yield from a saturated fatty acid, A T P yield = 17n/2 - 7 where n is the n u m b e r of carbon atoms in a fatty acid with an even number of carbon atoms. It will be a good exercise to work out the A T P yield from unsaturated fatty acids. For monoenoic acids, the A T P yield will be less by 2 since the F A D H 2 produced will be one less than from the corresponding saturated fatty acid. For polyenoic acids, there is additional complexity since a double bond is first reduced using N A D P H ( - 3 A T P ) and is reintroduced generating F A D H 2 ( + 2 A T P ) . Therefore, linoleic acid will yield only one A T P less than stearic acid. It can be generalized from polyunsaturated fatty acids having x double bonds, A T P yield will be 2 x - 3 less than that from the corresponding saturated fatty acid.

Acknowledgement The author is grateful to Professor A K Goswamy for his valuable suggestions

Reference Akunekwe 1 M (1993) Biochem Educ 21, 74 S K Singla

Department of Basic Sciences University of Horticulture and Forestry Nauni, Solan (lip) 173230, India