Disaster recovery plans: Two case studies

Disaster recovery plans: Two case studies

Computer Audit Update April 1994 protect key business information when its integrity is important. It takes time for people to adjust to a new secur...

191KB Sizes 0 Downloads 98 Views

Computer Audit Update

April 1994

protect key business information when its integrity is important. It takes time for people to adjust to a new security regime, and in the interim, I think every application audit should ideally include a quick look at Unix. This, of course, raises questions about how to carve enough time out of the audit budget for an application to look at the operating system as well: the second of these articles attempts to identify some key questions, and highlights the ways for Unix novices to get quick answers.



what to do;



how, when and where to do it;



what resources are available;



where those resources are located.

The plan itself should cover the following areas:

Introduction Alison Webb is an independent computer audit specialist. She divides her time between advising on security for specific applications, mainframe control security reviews, general reviews of computer installations and file interrogation, she also lectures and writes on computer audit topics.

DISASTER RECOVERY PLANS: TWO CASE STUDIES Stephen Hinde Deciding to implement a computer disaster recovery strategy is, relatively speaking, the easy step. It is after that decision has been taken that the hard work of actually preparing a disaster recovery plan comes. But what to include in a disaster recovery plan? How much detail should there be? Who should be involved? These are the q u e s t i o n s that are usually asked by management, or the 'volunteer' given the responsibility for producing the disaster recovery plan. Simply put, the disaster recovery plan must set out clearly to those responsible for the recovery process:

6



purpose and scope of the recovery plan;



recovery and policy;



critical application list;



plan responsibilities/schedule;



testing summary;



team structure and responsibility summary;



distribution list.

Plan activation •

detection of the disaster;



notification to management;



escalation decision levels/schedule authorities;



damage assessment;



declaration of a disaster;



activation of recovery procedures;



activation of command centre;

of

@1994 Elsevier Science Ltd

April 1994



activation of recovery teams;



invocation of alternative processing facilities.

Recovery teams •

membership;



objectives;



tasks;



action checklists;



critical resources/vital records lists;

Testing •

component testing;



integrated testing;



disaster testing.

Maintenance •

responsibility;



procedures;



schedules.

Computer Audit Update

supplier contact lists for: hardware; software; communications; consumables; services; local authorities; insurance; emergency services. Recently, I have been asked, on behalf of a number of organizations, if I had any examples of disaster recovery plans that I could let them have to give some indication as to how organizations had actually written their disaster recovery plans, so that they could learn from the experiences of other organizations, rather than trying to follow what some text book had said on the subject. I have selected two very different examples of disaster recovery plans and sanitized them to remove any possibility of identification. In their way these two plans are very different. In essence, the prime difference between the two disaster recovery plans was that one o r g a n i z a t i o n planned by functional responsibility teams and the other planned in detail by application. The disaster recovery plan for organization B will be published in Computer Audit Update in six parts beginning this month. The disaster recovery plan for organization A will be published in the Computer Audit Journal, which is being issued free to Computer Audit Update subscribers. But before l o o k i n g at t h e s e two organizations' disaster recovery plans, let us see what lessons can be garnered about disaster recovery plans from other organizations. A study by the US Department of Defence, based upon reviewing a number of disaster recovery plans that had been tested, or used for real, revealed the following areas for improvement.

Vital records required on invocation •

alternate site location(s);



command centre locations(s);



off-site data storage location(s);



staff contact lists;



user contact lists;



critical resources checklists;

@1994 Elsevier Science Ltd

Names and telephone numbers of recovery team members were listed in the body of the disaster recovery plans. Placing this information in an appendix makes it easier to change. Disaster recovery plans are often 1000 pages in length. The very bulk of such plans intimidates first time users. Condensing the text and removing redundancies would render a plan less formidable and more usable.

7

Computer Audit Update

Disaster recovery plans had little guidance about transferring backup data from off-site storage to alternate processing sites. The main lessons from the Mercantile Credit fire with respect to disaster recovery plans was that they should be modular, kept up-to-date and be accessible to those needing them in a disaster. Organization A was a large company with many production sites organized in regions. All production sites and regional offices were attached to the star and cluster network emanating from the computer centre based at head office. In addition to the midrange computers, each production site was controlled, to a greater or lesser extent, by process computers. The approach taken here took in a much wider v i e w - - a whole business resumption plan to cover the business computers, the process computers and the loss of the head office accommodation and facilities. The disaster recovery plan detailed, by application, the action to be taken at each of seven levels of disaster. Organization B was part of a large group. It, and a fellow subsidiary, used some esoteric and mainstream computers and peripherals. These were common to both organizations, as were the operating system environments. There was a uniform data network connecting the two sties, with the network equipment situated separate to the two computer rooms. Thus, the local area networks were unlikely to be affected in the event of a disaster to one of the computer rooms. The d i s a s t e r r e c o v e r y strategy, t h e r e f o r e , concentrated upon the provision of alternative computing power that was made available to the LANs on the two sites. Their use of computing was also unusual, in that it tended to be project-orientated rather than routine periodic business applications. Because of this, and a perception that providing their own in-house solution would be more secure and confidential than a commercial third-party backup site, they formally agreed to warm reciprocal backup of each other, supplemented, if required, by a cold mobile facility, which could be sited at either location.

8

April 1994

The approach taken in producing their disaster recovery plan was one of identifying teams and responsibilities. A very proscriptive set of checklists detailing duties and responsibilities, with formal sign off for each step, was established. The plan covered the who and how of declaring a disaster, notifying the backup site, moving to the backup site and returning to the home site after the disaster. It was recognized by both parties that the amount of computer power available at either site was not adequate to provide the needs of both. Consequently, only core processing could be provided. In the event of a disaster, the affected site would decide what was essential processing and the backup site would cease all non-essential processing. There were agreed arbitration procedures to make the decisions, if the two sites could not agree. Where substantial then the limited number of personnel requiring them would be temporarily relocated to the other site to avoid network congestion. In order to avoid the problems usually associated with reciprocal arrangements, the two sites a g r e e d to keep their c o m p u t i n g developments, policies and procedures broadly in line with each other, and to meet on a regular basis to ensure continuing compatibility and protection. The first part of this plan follows this introduction.

ORGANIZATION B DISASTER RECOVERY PLAN: PART 1 Stephen Hinde

Contents The contents list for the complete disaster recovery plan is given below. Please use as a reference for the other parts of the plan published in subsequent issues.

@1994 Elsevier Science Ltd