Issues in designing a student database Philip Burnard
Many larger colleges of nursing have centralized student databases in which all elements of students' records are collated. The smaller department or college may leave the design of such databases to course leaders. This paper highlights stages and considerations in the development of student databases. It offers definitions of fiat-file and relational databases; it identifies issues in prior planning and then of direct planning of the database. It offers a discussion of a number of issues related to what sort of information is stored in a student database. It closes with a discussion of the training needs of those staff who are going to use such a database. The principles described in this paper can also be adapted for use in the designing of any health care related database system. The paper also addresses the question of why all colleges are not, already, using such systems.
INTRODUCTION
JA series of booklets about all aspects of the Data Protection Registrar, Wycliffe House, Water Lane, Wilmslow, Cheshire SK9 5AF, UK. These are particularly useful as there are conditions under which the information in them may be freely reproduced within an organization.
Whilst most o f the larger nursing colleges will have centralized student databases, many o f the smaller ones rely on course leaders to organize the collation o f student data - including names and addresses, various details about their courses and marks. There are many ways o f storing this sort o f information (and any stored information o f this sort must comply with the Data Protection Act1). This paper identifies some o f the issues in designing a student database for a small department or college. As more and more nurse educators become c o m puter literate, so the call to develop a variety o f types o f databases will increase. While this paper is concerned with the development o f a student records database, the principles contained in it are applicable to the setting up o f any database system and may be o f value to the
Nurse Education Today (1996) 16, 56-62
© 1996PearsonProfessionalLtd
Philip Burnard PhD, M~c, RMN, RGN, DipN, C'~rtEd, RNT, Director of Postgraduate Nursing Studies, University of Wales College of Medicine, Heath Park, Cardiff, UK (Requests for offprints to PB) Manuscript accepted 16 November 1994
student, researcher and manager as well as to the nurse educationalist. T h e primary issues that are involved in making decisions about this sort o f database are the types o f computers that are used within the department and the types o f database management programs that are available. Most colleges and departments will use either I B M - c o m p a t i b l e computers (personal computers or PCs) or Apple Macintoshes. As IBMs (or clones or compatibles) are probably the most frequently used, this paper c o n c e n trates on software that is available for those machines. Most o f the programs (or similar ones) are available for the Apple Macintosh. Recently, Macintoshes have been introduced which run both programs specifically designed for that c o m p u t e r and those that are more usually run on an I B M - c o m p a t i b l e PC.
SOFTWARE The first decision that has to be made is about the type o f software that is going to be used to build a student database system. These sorts o f programs are k n o w n as database management programs or simply as 'databases'. This can be confusing as the w o r d database can thus be used to signify a program or as a description o f the finished student record system. The meanings are used interchangeably in this paper as the context dictates. T w o sorts o f database programs are widely available. The first is the flat-file database. This sort o f program enables records to be produced which are essentially like a card-file system. Each 'card' holds details about one student. That card, however, can be very large and can hold an almost limitless amount o f information. At first glance, this may seem like the most useful way o f storing information about students. Each card, for instance, will contain details about that student's w o r k and home addresses, their course details, their sickness record, the marks and so on. If you need to find out anything about a particular student, you simply ask the c o m p u t e r to pull out that student's 'card'. For the smaller and most simple student database, the flat-file database system can w o r k quite well. Windows (Microsoft) contains its o w n flat-file database, called Cardfile which is very simple but suitable for storing names and addresses. It is unlikely to be useful as a full-scale student records database. However, it is not good database design to try to store all information in one place. Large 'record cards' can mean that certain sorts o f information may be duplicated and this, in
Issuesin designinga studentdatabase 57 turn, leaves open the possibility o f data entry error. If errors in data entry do occur, they can make it impossible to find required information at a later date. In any situation in which a large amount o f information is collected about a large number of people, it is preferable to use a program that can divide information into various categories. The second sort o f database program is the relational type. A relational database is one in which various tables, relating to various different sorts o f information can be generated. Information from those different tables can be pulled together on the screen at any given time. Thus the 'parent' or main table may contain details o f each students' names and addresses. Secondary or 'child' tables might contain details o f courses, sickness, marks and so on. Each 'type' o f information is stored in a different sort o f table. In order to make it possible to access information about each student from a range o f tables, each student has a unique n u m ber. This number is used to link the 'parent' table with the various 'child' tables. All of this may sound a little complicated. W h y , for instance, might it not be good practice to store all information about each student on a separate 'card'? At first glance, it would seem to be the obvious way to organize information. A little more thought, however, reveals that databases are not only used to look up information about single students. It might be useful, for example, to consult the database in order to bring together a range of student marks. This becomes essential when examination boards are being organized. Alternatively, it might be necessary to identify patterns o f sick leave in a given student cohort. While both o f these operations can be carried out using a flatfile database, they are more simply and elegantly achieved by t h e use o f a relational database. Por all but the simple systems, the relational database is the one to choose. For a more complete and detailed review o f the concept o f the relational database, the reader is referred to C o d d (1990).
box, loading them onto the hard disk and then developing a database system. All will require some manual reading. There are some excellent third-party books published about all aspects o f these programs (Simpson 1992, Jennings 1994). All of these programs also call for considerable systems resources. It is recommended that they are run on computers with large hard disks (80 megabytes or larger) and a minimum of 4 megabytes of R A M (random access memory, Veddeler 1992). More R A M will make data access quicker and the database system easier to use. In passing, it should be noted that there is one other type o f database program available, known as the freeform database. This type o f program allows data to be entered in any sort o f format: no prestmcturing of the program is required. Examples o f this sort o f program include Idealist (Blackwell) and Memory Mate(Broderbund). While freeform database programs may be of considerable value to the nurse researcher who has to handle textual data of the unstructured variety, they are unlikely to be o f particular use in developing a student record database which, almost by definition, calls for a highly structured approach to data management. O n e of the more annoying features o f all software is the rate at which it is upgraded and at which it consequently goes out o f date. Whilst there is no rule that says that all database users should use only the most recent releases of a piece o f software, this constant upgrading means that older versions go 'out o f print' and support for them is slowly dropped by the manufacturers. The college or department that is buying a database program needs to be sure that they are buying the latest edition o f a program and that they have someone who can use the program. Various training courses are available for learning the larger database packages and a number o f video training programmes are also available. It is essential that new software is introduced carefully and it is rarely sufficient merely to suggest to staffthat they 'read the manuals'.
Installing t h e database m a n a g e r Programs These are a variety o f commercial database programs available on the market and popular ones include Paradox (Borland), Access (Microsoft) and FoxPro (Borland). Each o f these three vary in their user-friendliness. Access is probably the easiest to set up and use, followed by Paradox, with FoxPro requiring some database expertise. Paradox and FoxPro are available for both D O S and W i n d o w s environments, while Access had only ever been a W i n d o w s program. All o f these programs require a little time to master. It is never a question o f taking the disks out o f the
Before the software can be used, it must be loaded onto a hard disk. N o n e o f the larger programs will run from floppy disks: all are supplied on a considerable number o f disks which have to be downloaded onto the hard disk. For most o f the programs, that is simply a case o f following the installation instructions at the front o f the manual. For a person who is familiar with computer software, it can be useful to have a period o f 'playing' with the program. This will involve experimenting with how the program works, h o w tables are designed and so on. A little time
58
N u r s e Education T o d a y
spent in becoming familiar with the main features o f the program is time well spent. This playing time can also allow for an assessment to be made o f how best to use the program at the planning state. Each database program differs, slightly, in the way that it organizes data. These differences can guide the database designer in terms o f the way that he or she plans the database. Database programs, like all software programs, seem to have different 'characters' and these also influence the way in which operators work with the programs. Playing with the program can offer new insights into organizing information.
DATABASE
ESSENTIALS
Almost all databases have three main c o m p o nents. First, there is the table feature. This is a simple 'rows and columns' arrangement which allows data o f various sorts to be broken down and stored in the database. For example, the 'rows' are likely to represent single students while the 'columns' will contain specific types o f information about those students. The first column, for example, may contain the students identification number. The second might contain his or her surname. The third would contain first names and the fourth that person's address. The table features o f a database program will be familiar to anyone who has used a spreadsheet. Indeed, for simple database needs, the spreadsheet can be used in place o f a database (Hartman 1994). In this case, the spreadsheet program would be working in
You n e e d a s p r e a d s h e e t if:
You n e e d a database if:
Your data consists of just one table (fiatfile).
Your data are stored in several related tables that you need to crossreference.
You need to produce only one o r t w o reports, and they look similar to the
You need to produce many reports, in a variety of formats, such as invoices, mailing labels and phone lists.
data table. types in each column of the table.
Your data has a uniform structure, each column consists of just one type of data.
If you need to do calculations, they are based on values scattered around the table.
If you need to do calculations, they are based on values appearing uniformly in the table such as down columns or
You need to store a m i x t u r e of data
across rows.
Summary calculations are based on the entire table, or on only a few subgroups within the table, and the number of subgroups stays fairly constant.
Summary calculations are based on many subgroups within the data and the number of subgroups or across rows.
Graphs are preferred format for your output, and you need a wide variety of graph formats.
Most of your output is in the form of text reports,
exactly the same way as a flat-file database. The Box (below) identifies some o f Hartman's advice about choosing between a spreadsheet and a database. It is, o f course, quite possible to use both a database and a spreadsheet program - for different purposes. O n e o f the points that does not, necessarily, emerge out o f Hartman's comparison is that while databases are probably best for handling large, complex textual data collections, spreadsheets are often the tool o f choice for working with financial and numerical computation. It is highly likely that the person who does accounting for a college has experience o f working with 'rows and columns' o f figures and the layout o f a spreadsheet will be immediately familiar to anyone who has experience o f bookkeeping. Thus it may be useful to use a spreadsheet for accounting and a database program for student records. Second, there is usually a forms feature. A form, in a database, can be one o f two things. It can be the means o f entering individual student's information into the database. Alternatively, it can be the means o f viewing individual student's information. T h e form calls up information from a range o f tables or it allows the user to enter information into a range o f tables. Either way, the form is usually a simpler way o f w o r k i n g with the database than is direct recourse to the table. The amount o f information in a table can be overwhelming. The form can allow the user to see a subset o f the information contained in a table. The third element found in most databases is the reporting feature. This is almost selfexplanatory. The reporting feature allows for reports about students to be generated in an attractive and usually printable format. The course director might use the reporting feature to prepare a record o f course marks for a range o f students prior to an end-of-year examination board. Alternatively, the reporting feature might be used to generate student records that are given to the students as they leave a course.
Planning
the
database
This is the most time consuming but important stage o f student record database development. The first stage o f it should take place away from the computer and involves the use o f pen and paper: Before considering h o w to organizing things on the computer screen, the database designer should have a brainstorming session. Here, the designer - perhaps with a number of colleagues - j o t s down all the possible information that might be stored in the database system. Does the college, for example, want to record all marks?
Issuesin designinga student database
Does it want a record o f the titles o f end o f course dissertations? W o u l d it be useRfl to k n o w the names o f the students' mentors? A n d so on. Nothing should be excluded at this stage. After the initial brainstorming session and after consultation with everyone w h o is likely to need access to the database, the designer should begin to narrow down what will and what will not be stored in the finished product. A typical list o f the sorts o f information that will be stored at the 'front' o f a database might be as follows: • The student's name • His/her address while on the course • His/her home address • H o m e phone number • Year o f entry • N a m e o f course, a n d so o n .
There are two schools of thought about h o w much information should be stored in a database. O n e argument suggests that as much as possible should be stored following the idea that you never k n o w what you may need to look up at a later date. Another argument is that it is best to veer on the side o f a 'leaner' database. This is both easier to use and takes up less storage space. The database designer needs to consider both approaches and decide which sort o f database suits this institution and this time. Most database programs can be modified even after they have been up and running for some time. Such modifications need to be done cautiously and by experts. Changing one element o f a database often leads to the need to modify others. O n c e various 'sorts' o f information have been decided upon, it is important to break d o w n the information further into discrete elements. These discrete elements will then become the fields o f the database. A field is a slot in the database which holds only particular type o f information. For example, 'surname' may be one field, while 'first names' may be another. The 'surname field' will only ever contain students' surnames. All other sorts o f information will be stored in their own, appropriate fields. Deciding on what should constitute a single field is no easy matter. There are books written about the process k n o w n as normalization or the process o f breaking down data into small and more discrete chunks (Date 1990). C o n sider the following issue as an example o f the sort o f problems tackled under the heading o f normalization. Should a person's full name constitute o n e field? Might one field contain the information 'Smith, Peter John'? O r should the name fields be broken up into three or four? Should there be separate fields for s u r -
59
name, first name (1), first name (2), first name (3) and so on? If so, h o w many 'names' should be allowed for and what happens if one student has five or six names? Most books on the subject suggest that surnames and 'other' names should be kept in separate fields. In my experience o f designing and using a range o f student database systems it has been perfectly workable to use one field to contain the surname and first names o f the students. The same seems to be true o f address fields. I have never found it particularly useful to break down the address fields into 'street', 'town', 'county' and 'post code'. It is quite useful to have a complete address stored in one field unless you are likely to carry out esoteric analyses o f students by town or students by COUnty.
Formal normalization theory can be complicated but Townsend (1992) offers the following summary and advice on the issue: Fortunately, normalization follows a c o m mon sense. Follow the golden rule of normalization: enter the m i n i m u m data necessary, avoiding duplicate entry o f information, with m i n i m u m risks to data integrity. This stage o f developing fields should, again, be one o f experimentation. It is useful to design a 'rough draft' o f the database and then to try it out by using some fictitious names, addresses and details. It is important to test out the database by setting up forms and reports to see whether or not these elements o f the program work well with the fields that have been designed. A slightly more complicated feature o f this part o f the process (and one which refers back to normalization) is the decision about h o w many tables to use in the final database program. While, as we have seen, some small colleges will be content to have all data contained in one, large table, others may wish to have a 'parent' or main table (containing basic details o f names and addresses) and to have this table linked to a variety o f 'child' tables containing other details. The pros and cons o f table design, at this level, are beyond the remit o f this paper and the reader is referred back to the manuals o f the particular program that he or she is udng. All relational database manuals refer to the design and development o f multiple tables.
DEVELOPMENT
Database design does not have to be a 'once and for all' process. It is possible, as we have seen, to make structural changes to it once the
60 NurseEducationToday program has been developed to a fairly sophisticated format. Also, it is possible to continue to add features to the database. It may be useful, for example, to simplify data entry as much as possible so that all staffcan enter student information quickly and accurately. This may mean modifying the opening screen (the screenful of information that appears which the program is first fired up) so that it is quickly obvious to the user what he or she has to do. W o r k on simplifying an opening screen may take some time and involve a range of menuing and macro systems (Braithwaite 1989). Similarly, reporting systems can be developed and enhanced. As a rule, it is usually best to try to achieve 'simple' reporting methods that offer clear and unambiguous print out. Despite these suggestions for developing the database, it is also important that once the database is in use the basic format o f it should not change. Staff who are using it will want to be able to rely on its staying much the same. Changes should only be instituted which enhance or ease the use of the program.
Data entry Once the database system has been designed, the student records have to be transferred into it. At this stage, it is usually helpful if the database designer sets the program to start up with a data entry form as the opening screen. In this way secretarial or management staff can work at processing a number of student records quickly and efficiently. The data entry form should be simple and a very basic colour scheme should be used. It is tempting, with modem software which allows the designer flee reign over a large number of colour schemes, to use increasingly complex colour variations. However, the person who enters data has to look at the screen for long periods o f time and restful and simple schemes should be preferred to the complex (Townsend 1992). A method of data checking should also be instituted whereby after a range of student records have been entered, the operative reviews the entries that have been made for accuracy and inclusiveness. After all, the 'bulk' entry of student data is usually a 'one-off process and it is important that it is done correctly. Subsequently, new student records can be added as students arrive. Modifications and additions to existing student records can also be made. By the time bulk entry is being carried out it is important that the database system has been thoroughly tested and reviewed. These processes can be achieved through the use o f three or four complete sets of 'dummy' data. The designer compiles three or four complete records of 'typical' students, plus, perhaps, one
or two further 'atypical' students. These are then entered into the database. The trial stage will allow the designer to check the following: • That there are sufficient and appropriate fields in the database • That the fields contain sufficient space for data entry (this is particularly important in the 'address' field, where some addresses may run to five o f six lines o f data • That the forms and reporting features of the database work correctly • That all users of the database system are going to understand how to enter and extract information.
Database manuals and staff training The database designer should also be responsible for ensuring that all staff who are likely to use the database can use it. There are still many people who are computerphobic. This issue can be approached from a number o f angles. First, it is possible to develop help screens within the program that allow the user to call up on-screen help at any point in the program. Second, the designer might write a simple database manual to be read by staff before and during the use of the program. This sort of writing takes considerable skill and can, usually and unfortunately, only be undertaken by the designer for it is only he or she who knows how the program works[ A third approach is to m n a series of study sessions for staff who are going to use the database. These can be tackled from a one-toone basis, where the designer sits with each user and talks him or her through the use of the program. If this method is used, it is essential that the user sits at the keyboard and that the designer is not tempted to give the potential user a 'hands-on demonstration'. It seems to be a consistent fact that regular computer users very quickly forget what it was like before they knew how to use a computer. The other approach to training is via the use of a computer lab in which a number ofstaffcan be guided through the use of the program from a central console. A final consideration is whether or not the completed database might be made available on a college network system (Loomis 1987). If this happens, then it is essential that the designer builds in various levels o f passwords. There must, for example, be passwords that prohibit some users from entering the program at all and others that prohibit certain staff (or students) modifying existing data. Most of the larger database programs have very sophisticated security features. Perhaps the most important feature of all in database design is simplicity. If a system o f
Issuesin designinga studentdatabase 61 recording and maintaining student records is to be used and staffare not going to be tempted to return to a 'paper and pen' approach to record keeping, the database system must be both simple and attractive. Such simplicity is only achieved through a considerable amount of prior planning and hard work.
Objections and problems All of this raises the question of why all colleges may not, already, be using computerized database systems. Setting up such systems involve a large financial investment in the purchasing of the appropriate computers and/or of a computer network. Next, the software, itself, costs a considerable amount and 'updates' of computer software are frequently offered by the manufacturers - a factor which must be borne in mind when costing a database application. Third, there is a considerable training implication. Once database programs have been designed, and the issue of who designs them and for how much money must be considered, there is the issue of training staff in their use. Any training program is likely to take staff away from the 'real world' of their normal work - at least temporarily and may or may not be costly. Someone, too, must become responsible for backing up the database program - for making sure that all of the data that is contained on the computer's hard disk is also on a second medium (usually, though not necessarily, floppy disks or tapes). This is a continuing task. All computer systems must be backed up on a regular basis and any data contained within them must also be duplicated in another format. Finally, there is the fact that many people still do not 'like' computers and are frightened by them. While training may help there, there will, presumably, always be a group of people in any organisations who prefer not to use computers. It makes sense to have an 'all or nothing' policy with regard to keeping student records on computer. If some staff use a computer and some keep written records, there is a likelihood o f duplication or o f mistakes being made. In the end, a college should probably either adopt a totally computerized record system or stick to other recording systems. Again, any training programs that are instituted will cost money. There is also the considerably issue of the attitudinal shift that may have to occur if all staff are to use the computerized system. Is it all worth it? It is probably reasonable to answer 'yes' to this question. Although the financial outlay is considerable and training and maintenance costs may be high, the
advantages of such a system are various. A database allows for the careful storage of a huge amount o f information in a small space. It also allows for that information to be accessed rapidly and in an array of formats. It is far easier, for example, to identify certain subsets of students and certain subsets of information using a computer database than it would be doing it 'by hand'. The 'processing' ability of a computer is considerable and a computer can clearly handle information more rapidly and in more complex patterns than can be achieved with a 'paper' system. The computer can also produce standardized reports and transcripts very quickly and easily. Error checking facilities can also be built into the system to indicate when human error has occurred in information input. This is unlikely to be the case with other forms of information management. In the longer term, network, both local and international, will allow for the transmission o f student data between centres and sites. While this raises all sorts o f ethical issues that are beyond the remit of this paper, it is also clear that a computerized database system can make the organization o f informarion on local and global levels more efficient. O n the other hand, it seems likely that earlier visions o f 'paperless offices', in which almost everything is stored only on a computer, are likely to become myths. Computers probably do aid efficiency in many ways but they do not remove, entirely, the need for written records and reports. However, ira student database is to be effective, it is important that as m u c h information as possible is contained within the database system. It is also essential that all o f the required information, in an organization, is entered, into the database. The system will not function if some information is on c o m puter and some is kept on paper. It is also essential that an effective backup system is maintained. If a computer system 'crashes', essential data may be lost. If a computer system is stolen, data will certainly be lost.
CONCLUSION This paper has focused on the issues involved in designing a student database system. The principles contained in the paper apply to the design o f any nursing or health care related database system. Before a database is set up, the planning needs to take place at the 'pen and paper' level. Some normalization of data is essential and a thorough testing of the system before 'real' data are inserted is vital. Ease of
62
NurseEducationToday
g e t t i n g d a t a i n a n d o u t o f t h e s y s t e m is also a p r i o r i t y a n d - as w i t h m a n y t h i n g s - s i m p l i c i t y is t h e k e y n o t e .
REFERENCES Braithwaite K S 1989 Systems Design in a Database Environment. Intertext Publications. New York Codd E F 1990 The Relational Model for Database Management: Version 2. Addison-Wesley, Reading, Massachusetts Date C J 1990 An Introduction to Database Systems,
Volume 1. Addison-Wesley, Reading, Massachusetts Hartmann E 1994 Get a Handle on Database Design. Windows International 5 (4): 174--180 Jennings R 1994 Using Access 1.1 for Windows. Que, Carmel, Indiana Loomis M E S 1987 The Database Book. Macmillan, New York Simpson A 1992 Mastering Paradox 4 for DOS. Sybex, San Francsico TownsendJJ 1992 Introduction to Databases. Que, Carmel, Indiana Veddeler S 1992 Upgrading and Maintaining Your PC. Abacus, Grand Rapids, Michigan