Principles of recoverability in transaction-processing systems The provision of a cost-effective transaction-processing system is aided by carefully analysing its recovery requirements and making the best tradeoff between cost and performance during the system design and definition phases. Recovery techniques for transaction processing systems are described. Typically, the type o f system considered has a medium-to-large database (i.e. 100 Mbyte to over 1 000 Mbyte) and supports a medium-to-high transaction rate, with a high availability and a mean response time o f a fewseconds. The approach adopted is based on an analysis of possible types o f failure, in terms of
•
• the symptons exhibited and fault diagnosis, • formulation of recovery objectives (mean time to recovery (MTTR) for that fault), • the selection of an economical set of recovery procedures for that fault and that MTTR, • the identification of the stage of procurement, design or operation when specific recovery decisions must be taken, • the implications for facilities in database management system and teleprocessing monitor software.
•
The analysis is followed through for a typical large system.
The method of analysis presented is based on a study of failure types. There is a series of important stages in this process:
Recovery techniques applicable to the broad middle range of transaction processing systems will be discussed. Typically, such a system runs on a medium to large general-purpose computer, uses a significant proportion of general-purpose software (operating system, data management/database management system (DBMS), teleprocessing (TP) monitor, high-level language compiler) and supports access from a terminal network (tens to hundreds of terminals) to a mediumto-large database. The paper establishes a framework of analysis techniques to enable a designer to • • •
formulate resilience objectives, on the basis of stated user requirements, make a cost-effective choice of resilience mechanisms, evaluate software facilities to support those mechanisms
R E C O V E R Y IN T E L E P R O C E S S I N G / D A T A B A S E ENVIRONMENT T he principles governing recovery in a TP/database environment are that
•
•
•
ESTABLISHING A RESILIENCE POLICY
•
• •
•
•
•
the result of any failure should be predefined in principle, and a recovery strategy, whether manual or automatic, should exist,
Infotech InternationalLtd, NicholsonHouse,Maidenhead, Berks. SL6 ILD, UK ©1978 Infotech International
242
inconsistent database states should not be accessible to application programs, there should be a range of recovery strategies available for user selection to enable a specified mean time to recover (MTTR) to be achieved economically for a given failure type, where a failure, and recovery from it, is local to a process, then other concurrent processes should not be affected, where a consistent state of the database can be achieved by a deterministic procedure, the software should perform this automatically, where the failure is of a non recurring nature (e.g. timeout or deadlock) automatic reinstatement of processing should be attempted.
•
fault detection: there is often a wide gulf between the symptoms exhibited by a system and the underlying fault; the treatment of symptoms without understanding the causes can be dangerous, fault frequency, speed of recovery: the strong interaction between the mean time between failures (MTBF) for a fault and the MTTR devised from it. Overall system availability of a given level can be achieved in many different ways - it is the designer's job to choose a cost-effective combination. what state to recover to: recovery need not be perfect in terms of total reinstatement of processing and up-to-date data state; it matter that users should be aware of the result of a recovery, they should have agreed to and planned for the impact on their procedures and the recovered state should be known, generated and satisfied, choice of recovery procedure: having made the above decisions, the designer is in a position to select the recovery techniques to be used and to define the operational and/ or software control procedure for their use, evaluation/selection of software facilities: recovery, particularly where a certain amount of automatic decision making is taking place, and the MTTR is low, makes significant demands on software facilities. The extent to which special-purpose modules can be 'bolted on' to the existing software may be limited, insurance premiums: laying the foundation of recoverability requires investment in overheads to normal proces-
0140-3664/78/0105-024252.00 © 1978 IPC Business Press
computer communications
sing; the level of this is dictated be decisions taken above; the lesson is that recoverability design is intimately linked with the discipline of system sizing. The designer's course through the above decisions is not quite as smooth as the checklist suggests. As with all designs, there are cyles of iteration, involving questions such as • •
'Am I still meeting my objectives/user requirements?' 'Am I still within cost/performance constraints?'
Frequently, there is a limited range of options, with major steps in facility/cost/power between. This often makes the formal quantification of design tradeoffs difficult. Because the result of the design process is an integrated and coherent set of recovery procedures, the designer must take pains to ensure that the recovery procedures are consistent and compatable. For example, there is little point in dual-recording the database if a loss of the system catalogue results in a 5 h break.
FAI LUR E T Y P E S
Backing-store damage This can be • significant damage to the application database, • localized damage to the database, • logical corruption of the database (i,e. the results are plausible to the software; only applications and/or users can detect the trouble), • loss of vital operating system files - e.g. recovery files, message logs, etc.
Hardware failure This could be a failure in the • TP network (partial or complete), • CPU or main store, • backing storage.
FAULT DIAGNOSIS
Before discussing fault diagnosis, it is appropriate to identify the useful distinctions between failure types. Some of the failure types listed below can be regarded as potentially caused by others. Several such faults may be present at any one time. However the types of failure presented are identified as being useful points from which to ask the questions
Fault diagnosis is a problem, and a complete general solution is not possible. Ultimately, it is up to the system designer to cooperate with his operations manager and system controller in defining a tailor-made set of diagnostic and recovery procedures. However, the paper attempts to provide some guidelines.
• how long will it take to recover from this? • to what state? • by what means? • using what resources?
Principles for fault diagnosis and recovery procedure design
Application program failure There are several types of application program failure. They are: • voluntary termination; the program decides, for application reasons, to abandon processing, • phase discard; a (typically batch) program may wish to abandon the current phase (e.g. a transaction group), and to discard the database updates, but to resume with the next phase; ('phase' is commonly used as a synonym for 'success unit'), • faulty instruction code or in-store data; the program cannot proceed because to do so would be illegal, • transient system error of a recoverable kind (e.g. deadlock over resource allocation), • error-code return to the application program from a call on an operating system function, • detection of a logical fault in the database, (possibly caused by a logical fault in the application code).
Failure of software This can be a • complete crash of the operating system, • failure of DBMS (without an error return), • failure of the TP subsystem.
vol 1 no 5 october 1978
For each possible symptom, there should exist a set of diagnostic procedures/instructions. Once the software has been debugged, programs are faster and less error-prone than operators, therefore automatic diagnosis should be attempted where feasible and cost justified. Operators will make mistakes, therefore manual procedures should embody selfchecking and fail-safe features (which themselves can be computer-aided). The diagnostic procedures should knit together in such a manner than an intransigent fault is passed through levels of escalation until it is fixed, rather than being ignored and left to cause trouble.
Symptom classification The range of basic symptoms is presented in Figure I. One of these types accompanied by appropriate documentary support, will ususally be the starting point for a diagnostic process.
Diagnostic procedures The design of diagnostic procedures is a process very specific to individual requirements. Table I can be used as a starting point. In this, each symptom is evaluated for relevance to possible causes. For each unit in the matrix, the designer should consider: •
what is the probability of this being the cause?
243
End user I
Loss of Cannot access
•
Data- processing staff
I
I
No Form reply
I
,
Odd resu It
I
DBA ]
System controller
Operator
Manual diagnosis
Eme!oency Routine ] check message findsfault
Content
I
System crash
I
Run unit fails
I
Recovery message
Message from
I
User
I
I
Progrommer DBA
I
System
Figure I. Symptoms classification
• • •
what diagnostic tests can be performed? should this be a software or manual decision? in what sequence should the tests be evaluated?
R ECOVERY
STRATEGY
The aim of the diagnostic process is, in all cases, to arrive at an identification of the failure type. Objectives of the recovery process are the containment of the damage (by, for example, preventing access to bad data, or by preventing use of unreliable programs) and the recovery within a specified time to a predetermined state.
Automatic diagnosis Where it proves feasible to make the diagnostic decision automatically, the option exists to perform the recovery pro cessing automatically, as long as the scope and nature of the damage is known clearly enough, a deterministic recovery procedure exists and the necessary resources (e.g. before-look journal) are intact. In any case, it is highly desirable, at the very least, to ensure automatic fault containment as far as possible. This is why, where an automatic diagnosis fails, the software should make pessimistic assumptions in deciding, for example, what data/program to disable and/or what subsystems to close down. After an exception condition, the software should automatically attempt to return the database to its last consistent point. It is only under certain conditions that aborted run units should be automatically restarted. For example, if a program caused a run-unit exception by violating address limits or generating bad numeric data, it will continue to do so, however many times it is restarted. For each program, it should be possible to specify choices from a range of options. Thus, when an error occurs, a contingency handler (most likely associated with the TP monitor) can carry out specified action, for example: • • • • •
if in test mode, make some compromise recovery, e.g. skipping some instructions and continuing processing.
There is no simple way of unravelling the results of subtle corruption which have subsequently been used by other run units. End users must participate in correcting the database; they can be considerably helped by the appropriate analysis of the log files. The basic parameters required are: date/time, program/ transaction type, user identity, record before states, record after states and transaction contents. Analysis by record occurrence, showing runs participated in and alterations, will also be of value.
Processing recovery The levels involved are: •
the program itself in unaware of recovery taking place, e.g. instruction retry, I/0 retry, DML command roll back and retry and software traps errors and carries out predefined circumvention, • the program automatically recovers (resumption at the start of a phase, check the failure type for recoverability, reset the working storage and resume), • operator controlled restart (database recovery, select the program status checkpoint that corresponds and run the program in the restart mode), • manually controlled resumption from the beginning. The restart of a run unit that consists of a series of units, some but not all of which have been completed, requires synchronization of the database state, program state, working storage contents and nondatabase files.
EXAMPLE
OF RECOVERY
ANALYSIS
For this example, the existence of various features must be presumed:
• online processing including a high proportion of updating which must be supported approximately 20 h/day, • concurrency is such that block- or page-level locking is necessary, • the database is large - in excess of 1 000 Mbyte; because of this and because of a modest availability requirement it was decided that dual recording was uneconomical, • the overall recovery objectives were that not more than 5% of computer processing should be devoted to recovery operations; that where the type of failure permits it (mainly scope of database damage), recovery delay should not exceed 60 s; that where failures are limited to a runanalyse failure type (transient, storage corruption, deadlock, unit, concurrent processes should continue unaffected, requested termination and permanent fault), and that the MTTR for an (occasional) disc failure could try to restart where the failure type offers a reasonable be of the order of 1 or 2 h, chance of success, • the use of a data-management system that relies on page inhibit the transaction type after n failures of a specified locking and roll back for phase recovery has been pretype, sumed; alterations in the analysis to make use of delayed store working data in a place of safety, updating are fairly slight. warn the user,
244
computer communications
Table 1. Association of possible failure symptoms with underlying causes
~
~g Cau ses
"-~'a
~
~._
E
E~
E "-
"Z
.-= ~-
i= o
-
:-~ .~
o
,_.m o ~
~
~O t-
8
.~
Symptoms
D
E
L.-
,,
<
3~....,Suo~.~
..1ooo8-,
o,~.
t~
,1..~S
em
Cannot get in
X
X
~,/
X
X
x
d
4
d
d
x
x
x
d
d
d
4
No resu Its
x
x
~/
x
x
x
d
~/
~/
~/
x
x
x
V
,,/
~/
x/
Incorrect form
x
x
x
x
x
x
~/
x
x
x
x
x
x
x
x
x
x
I ncorrect content
x
x
x
x
x
4
4
x
x
x
x
x
V
x
x
x
x
Live diagnosis
~/
x
4
x
,/
4
4
x
x
x
V
V
d
x
x
x
x
V
Testing error
~/
x
4
x
4
4
4
x
x
x
4
4
,/
x
x
x
x
4
Error during routine check
x
x
x
x
x
4
d
x
x
x
,/
d
v'
x
x
x
x
V
Error detected by other
x
x
x
x
4
V
d
x
,/
x
4
4
4
d
X
X
X
4
Recovery message
~/
,,/
~/
~/
x
x
x
x
~/
V
x
x
x
x
V
4
x
> x ~, .=
x
x
x
,/
q
x
X
X
d d
en
I r,.,
,~ ©~-
--~ "-6
Eo E ~.
Voluntary VM failure
~/
x
x
x
x
[~/]
Involuntary VM failure
x
x
~/
~/
x
~/
x
x
~/
d
,/
d
x
d
X
X
System crash
x
x
x
x
x
x
x
~/
x
x
x
x
x
,/
x
x
V
Message from user
x
x
,/
x
x
4
4
4
,/
,/
x
x
,/
,/
,/
,/
,/
Message from D BA
x
x
x
x
x
4
4
x
x
x
,/
4
4
x
x
x
x
d
Message from system
~/
4
4
4
4
,/
x
4
4
4
v'
4
x
4
4
v'
4
4
Fault report
X
x
x
x
,/
4
x
,/
x
x
,/
,/
x
4
,/
,/
,/
4
I
,~ i(:~ °
vol 1 no 5 october 1978
245
Application program failure Voluntary termination The objectives are to permit the run unit's effect on the database to be removed and to permit the run unit to be abandoned or continued. The procedures to be followed are to stop the run unit, remove its updates by roll back, release locks and free resources and to return to application code with error status. The requirements of the software include • • • • •
•
• •
the dynamic invocation of roll back with the deletion of the run unit, that the run unit roll back should not affect other run units' response unduly - i.e. ~< 10 s for trivial transactions, the ability on invoking 'transaction-end' to declare whether it is 'good' or 'bad', that the journal should contain a transaction-start recovery point for the run unit, that, for batch programs, the ability to establish recovery points at intervals during processing is needed; ideally, the proportion of resources devoted to the establishment of such checkpoints will remain low i.e. approximately 5%, that batch programs must be able to resume processing or restart at the end of the last completed success unit; correct alignment of the database state, other files, program point of control and the specified program working storage is necessary, that the recovery procedure must be safe regardless of the exact point at which the failure occurred, that, for batch processing, the system must keep a log of programs, in a 'waiting-to-restart state'.
Run unit failure The objectives are that the effect of this run unit must be deleted. The misdemeanours of this type must be counted, and when a limit is exceeded, the use of the transaction type must be inhibited. If the program is a batch run unit in a restartable state with the database secure and consistent it should be left. Details should also be recorded and presented for manual decisions to be made. The procedures to be followed are to: • • • •
perform roll back to the start of the current success unit, release all resources held by the program, transmit a message to the user, for batch programs only, establish the program as a 'need to restart' at the last successful checkpoint.
The requirements of software are that the journal should contain a success-unit start-recovery point for the run unit. This type of close down should have a minimal effect (~< 10s) on other run units. Batch programs must be able to establish checkpoints. They must also be able to resume processing at the end of the last completed success unit. Other requirements are that one should record on suitable media, diagnostic material to aid manual decision. Where online transactions are concerned, a message of acceptable form should be displayed to the terminal user. The recovery procedure must be safe regardless of the exact point at which the failure occurred and, for batch processing, the system must keep a log of programs in a 'waiting-to-restart state'.
246
Deadlock The objectives are to identify one of the participating run units, and to release all its resources; automatically to resume processing at the start of the current success unit (priority to be given to TP over batch) and to record the event, and to keep a count per transaction type. The procedure is to perform success-unit roll back for the chosen run unit, to realign and reset all necessary resources (including those such as the deletion of the spooler O/P for this phase) to permit restart and to restart the run unit. The software requirements include making sure that the journal contains a success-unit recovery point for the run unit and that contingency handling exists to distinguish these types of error. This type of closedown should have a minimal effect (~< 10 s) on other run units. For batch programs, the ability to establish checkpoints is needed and they must be able to resume processing at the end of the last completed success unit. In addition, the recovery procedure must be safe regardless of the exact point at which the failure occurred.
System/supervision software failure Total system crash The objectives are the analysis of the cause of the crash and the maintenance of an indicator of areas which are opened for update during running. Database areas should be secured while recovery is in progress. One should also attempt recovery at a speed that is a function of the nature of the crash. Recovery will take progressively longer for cases in which the disc files and the main store are all right, in which the main store and the disc files are al'l right and in which selected disc files are lost. The procedure is • • • • •
on restart, identify all the run units that have been started but not completed; roll back each run unit, where disc files have been corrupted, roll forward database from the security copy before roll back, for the TP system, reconstruct all messagesfrom the journals, restart active run units, if the failure is repetitive, undertake a manual investigation
The requirements of the software should include making sure that the journal contains a success-unit start-recovery point for the run unit. The software should also have the ability to establish checkpoints for batch programs. These batch programs must be able to resume processing at the end of the last completed success unit. Software should have exclusive access to the failed areas while recovering. Logs and journals should be amenable to analysis for detection of run units to be recovered and system reload must be automatic in recoverable cases. A facility is needed to enable the analysis of the database state to take place to determine the extent of the damage. Special requirements needed for database long recovery include the ability to take dumps by realm/file. Dumping 'on-the-fly' requires the ability to dump while files are in the shared update mode (to minimize the interference of dumping with processing). Roll forward utility must be able to work from static or 'on-the-fly' dumps. When roll forward reaches the point of the crash the system roll back should be perfor-
computer communications
med. A 'force new volume' command must exist in the journal file to permit recovery concurrent with other processing on files that are all right.
Data management software failure In this case, the objectives are to establish the nature of the failure and the extent of the database corruption. The database state should be conserved until the DBMS is repaired, but, where possible, restricted access to the database should be given to other run units. The DBMS should be repaired as quickly as possible. The procedure necessary is to • • • •
analyse the failure type, if the failure is potentially transient, invoke the overall quick recovery; output the diagnostic material, if it is not transient, output the necessary diagnostic material and a request for DBMS repairs, when the repair is effective, roll back and restart.
Failure of the TP subsystem The objectives must be to return the database to a consistent state while the TP system is being repaired, and to recognize that this is a TP failure and to continue batch processing in the meantime (where it is safe and feasible to do so). One must also try to ensure effective communication with TP users as the repairs progress and try to recover the TP system within about 5 min (depending on the error diagnosis). The procedure is • to retain all the data locks held by the run units affected to protect concurrent users; if this information is lost, a full system crash must be invoked, • when the TP system comes up again, roll back the database for all current run units and restart the run units without the terminal operators becoming involved, • users at the terminals must be informed of the failure and state of their jobs, • if the second procedure fails because of a hard failure in the TP code or a serious loss of TP recovery files, a choice must be made between continuing batch processing only with parts of the database locked out while TP repair takes place or invoking a complete system shutdown. In this case, the requirements of the software include dual recording of essential TP recovery files and effective dialogue between the TP contingency handler and the data-management recovery states.
Loss of files Significant or localized damage to backing store These are detected by a run-unit crash, an operator or a user. The objectives are to determine the extent of the damage, to inhibit the processing concerned with damaged files, to permit the processing which only concerns undamaged files to continue and to effect a repair of the damaged files as rapidly as is economically feasible. The procedure is as follows:
vol 1 no 5 o c t o b e r 1978
•
• • • • •
•
•
suspend all the processing that is likely to be affected (in the case of TP run units tender the appropriate apologies/ warnings to the terminal users), diagnose the extent of the damage, recommence the processing that is 'safe', define the transaction types to be inhibited for the time being, select a recovery procedure, perform the recovery (restore the damaged files and as necessary carry out roll forward from the after-look journal until journal end, and then perform run-unit roll backs to the latest success-unit boundaries), reestablish processing; some run units can restart (batch processes from the checkpoint); some must be failed to subsequently recommence, deinhibit the transaction types.
The requirements of the software are the ability to define the success unit's protection of the database consistency vis-a-vis intermediate states and for batch programs, the ability to subdivide processing with success units so that the database recovery is synchronized with the checkpoint/recovery points. The software should provide integrity analysis tools to assist with the diagnosis and to ensure the security of the journals. Roll-forward and roll-back utilities are also needed to bring a dump forward to a 'real' time and to roll a file back by run unit to phase begin. Detection of logical damage to application files The aims in this case are to diagnose the extent of the database damage (the pointers affected, etc), to rectify the damage and to conserve the state of the database. The method is to • • •
• • •
stop the process, report the situation, decide on its severity and select automatic circumvention or abandonment (see the procedure for application program failure), impose temporary restrictions on file access and/or transaction types, diagnose the extent of the damage, repair the damage.
In this case, the requirements of the software are based on the view that this is an application problem. Application programs must be written to audit the logical consistency of application data. Application programs detecting such logical faults must report the problem and take circumvention decisions. Repair activities may take the form of recoveries of the type mentioned under backing-store damage. Manual repair facilities at the character/word level are also required. The ability to inhibit certain application-program types during such repair activities is required. The ability to analyse the system journals to detect run units that may have caused a broadcast of the bad data is necessary. Loss of critical system files The catalogue, TP message log, TP interprogram-transfer file, job queue and journals for recovery are special cases, requiring cold start and manual intervention. It follows that such files should be heavily protected to minimize the frequency of loss, e.g. by dual recording and before/after-look journalizing.
247
Hardware failure • A network failure (whole or part) is equivalent to a TP system crash, a CPU or main store failure is equivalent to an OS crash and a disc failure is equivalent to disc file damage.
ESTABLISHING A SECURE ENVIRONMENT Type of decision The recovery requirements must be predefined in good time. This is because several fundamental decisions have to be taken at very early stages to ensure that the operating environment offers sufficient capability and resilience to meet those requirements. Precautionary decisions should cover the following areas: • the classes of run-unit allowed, • the levels of concurrency, • which built-in recovery facilities should be configured in the installation's copy of the DBMS, • the policing of run-unit behaviour (the timeout, the DML command types and sequences permitted and the success unit scope limits), • the selection of the recovery techniques to be adopted, • the types of recovery files (before looks, after looks transaction log, size of locking units and time spans covered), • the levels of redundancy of data e.g. dual database, • the security of the journal (dual files and remote devices), • the level of hardware redundancy.
• • •
porary logs and devices for logging, defaulting automatic system actions for each class of failure, database and teleprocessing network operating procedures, subdivision of the database into units for recovery control, protection level for each area.
Application system environment The DBA has to assess the adequacy of the overall installationintegrity controls for each application, and take decisions on •
override procedures to give manual action in place of system defaults, • the characteristics of each application program such as online or batch, performance/response time characteristics, read only or update, expected traffic, profile of expected database use, success units (simple or general, scope and duration), recovery techniques to be used for each failure type and restart characteristics, • overall scheduling and program coordination constraints (program sequences and interdependencies, and deadlines).
Run-unit scheduling It should be possible to specify temporary variations from a particular program's operational profile as defined above. Using the total scheduling information available, the DBMS and operating system must schedule the run unit according to the response time and/or deadline requirements, database availability, availability of the appropriate integrity-control resources (e.g. the type of logging in use) and concurrency constraints.
Installation definition Failure Facilities are required to enable the DBA to specify the maximum levels for recovery performance for each class of failure. The DBMS should provide appropriate attachment points, with standardized interfaces, for the enhancement or addition of facilities by the DBA. Examples of such facilities are database procedures, communication facilities and support for unusual devices. The DBA also requires the facilities to describe to the DBMS the consequences of the following decisions: • • • • • • •
is it a multiple database environment or a dedicated site? what are the outer limits of the availability and recovery requirements? is online working to be supported for update or read only? are online rununits to have one or more success units? what levels of concurrency are required by the circumstances? what levels of concurrency are chosen for performance? what types of run unit are involved in concurrency?
Decisions on this cover the • • •
• • •
general recovery procedure to be used for each failure type, exception rules - circumstances in which an alternative recovery procedure is to be attempted, contingency processing; definition of the hierarchy of recovery techniques is to be attempted if a given technique fails for the specified reasons, the communication protocol of the DBMS to DBA requesting manual intervention, operating procedures associated with recovery, how the DBMS recognizes exceptional situations that it cannot handle fully automatically, and of which it must inform the DMA.
There should be a system command language for use at the time of the failure for manual selection and direct control of the recovery processes. The DBA and the DBMS between them must take the following decisions:
Database design The decisions are concerned with the setting and policing of database design and usage standards, logging characteristics including before looks, after looks, transactions, run stream, logging units, log security, single or dual, local or remote and additional direct access tem-
248
• • • • • • •
what is the exception type? is the database damage localized or global? which run units can be restarted automatically? which run units can be restarted manually? which recovery files are available? which recovery should be initiated? which users should be notified?
computer communications
High volume online updating systems may also require some chronological indexing of the system logs, to be used to 'net out' the effects of multiple alterations to the given data object so that, where possible, only one copy of this image needs to be rewritten.
• • •
•
APPENDIX Architecture of DBMS and TP monitors The architecture of a conceptual model of a transaction processing system is discussed and a check list of ideal features is derived. The purpose of the features are examined and the merits and trade offs (cost and performance) in using those features are considered. A selection of currently available software products is checked against the list of features, as an example of the way in which a selection study may be undertaken, and as an indication of likely future developments. The approach is applicable to the evaluation of other special or general-purpose transaction-processing software, and it provides a checklist for the designer of such software.
• • •
DBMS recovery performance features • • • • • • •
General-purpose products A limited range of general-purpose products will be discussed. This approach has several advantages: • The general-purpose products tend to be designed for wide applicability, and hence have easily recognizable features. • It is easier for participants to particularize from the general case when special-purpose facilities are to be adopted. The period in which mainframe vendors tended to pass off purpose-written products that had been hastily adapted as true general-purpose products is now ending. Those products that originated in that way (such as IBM's ClCS, Univac's TIP etc.) have been extensively redesigned and rewritten.
Broad requirements The functional principles are: •
•
•
•
there should never be any permanent data loss; if this principle is violated, there is no guarantee of system security against accidental or malicious damage or tampering, regardless of the nature of the failure, the system should retain, if possible, knowledge of the terminal states, the message status and the last completed success units, if possible, closedown after a failure should leave the database consistent; if this is not possible, there should be a failsafe lockout to enforce restitution of consistency on resu mptio n, where the fault is clearly identifiable, and a predetermined recovery procedure exists, the automatic recovery and reinstatement of processing should be the quickest, the least error-prone and the most complete.
Resilience features of a typical general-purposeDBMS •
Level of locking, file/area, page/record,
vol 1 no 5 october 1978
lock scope, resolution of contention, journalling, • run-unit start/end, • page/record, fast recovery from • local DB damage, • run-unit failure, • system crash, secure recovery from wider DB damage, adequate interface with TP monitor, utilities.
Ability to break long jobs into short, restartable success units, condensed/c0nsolidated journals, parallel journalling by file group, request change of journal volume, run unit before images on disc, incremental dumping, with compatible roll forward, automation of • initial diagnosis, • basic consistency restoration, • transaction recovery for some failure types.
Principles for DBMS interface to TP monitor • • • •
Jointly designed as an integral system, DB recovery achieves consistency even when some run units are batch (i.e. not under TP monitor), DB recovery to success unit boundary takes account of state of transaction/message/terminal state, database locks for multimessage pair transactions.
Desirable characteristics User and system requirements vary enormously. It should be possible to exercise a reasonable range of options in levels of protection and recovery performance capacities. The designs should be able to select from a menu of mechanisms giving a useful range of MTTRs for a given fault. Most successful computer installations grow rapidly. The products should offer a useful upward-compatible evolutionary path. Some people occasionally change computer manufacturers. Industry standards , such as COBOL and FORTRAN, are intended to make this easier. The resilience features of the Codasyl COBOL database facility are approaching suitability for standardization, but this is not true of TP monitor features. Even so, it is possible to watch out for peculiar features, compliance with which can tie an installation to a product for an inconveniently long time.
Model of a transaction-processing system To aid the comparison of a variety of products, often documented with different terminology, a conceptual model of a transaction-processing system is presented (see Figure 2). The model does not represent any particular set of products; it is intended as a general pattern, to illustrate the princi-
249
i i
\
'
Oo~irln ,
~
. "
~
~ 1 06 H 5
L v=~ovo=~j
"~-e~-) {J0ui'l~al
Da~aflow • Cont~roI ('low . . . .
The marking of common recovery points in the event log and in the database journal is one important tool in achieving this object. It is a common problem that the coordination of concurrent updates between batch and TP programs is difficult because the batch programs do not run under the same scheduler. Specific examples of this will be examined below, and it will be shown how the DBMS itself can contribute to alleviat ing this problem. Invoked transaction programs run under the control of the scheduler, usually within some form of TP services envelope. Effective processing recovery depends upon the power of this envelope to isolate run units and to perform first-level diagnostics. It is common to regard run-unit misdemeanour as being trapped by a 'contingency handler'. The systems designer may be able to specify some of the features of this component. It is valuable for resilience purposes if the scheduler is able to maintain counts of transaction-program faults, and, when certain thresholds are reached, either to disable a transaction type, or to revert to an older, safer version.
Figure 2. Conceptual m o d e / o f a transaction-processing
system
DBMS
pies of information and control flow underlying most products that attempt to meet the requirements defined above.
As stressed above, the TP monitor and DBMS should have a common philosophy with respect to contingency heandling, success units and event recording on journals. It should be remembered that the DBMS may be being used concurrently by batch programs. Some of these run units may consist of a series of success units separated by checkpoints.
Line handler The line handler manages the network and presents a simple message interface to the rest of the system. It is the primary interface for the system controller. If it is well designed and powerful, it can significantly enhance the apparent resilience of the system by engaging in a sensible dialogue with terminal users when systems faults occur. If it is located in a separate front-end processor, independence from mainframe hardware faults is gained, and high network resilience can be gained by duplicating only the relatively cheap front-end processor.
PRODUCT COMPAR ISONS The transaction-processing products of ICL (on the 2900), Univac (for the 11/XX series) and Honeywell (for the 60/XX series) adhere to the range of principles previously described. The IBM series differs in two ways: •
the different approach of the mainline products, IMS, IMS/DC and CICS, the availability of other vendors' products (the IDMS DBMS and TP monitors such as Shadow II and Taskmaster).
Message responder
•
The message responder manages the input and output message queues, and should maintain a log of messages for resilience. There are three files that are critical for a sensibly resilient system because warm restart from a system crash relies heavily on the preservation of these message states. Ideally, the option should exist to dual-record the message queues, and, in the event of their total loss, a measure of reconstruction from the message log should be supported.
IDMS AND TP OPTION ON ICL 2900
Program scheduler Whereas the message responder manages the mapping of messages to terminal users, the program scheduler manages the assignment of messages to processing programs. It also manages the queue of interprogram messages which is one means of continuing a multiphase dialogue with terminal users. As with the DBMS, the program scheduler must operate in terms of success units. It is under a strict obligation to observe the same recovery principles as the DBMS, otherwise, it is easy for database updates to be lost or doubled up with certain types of failure.
250
Transaction-processing option The transaction-processing option (TPO) of VME/B supplies a means of defining a set of application programs { known as 'application virtual machines' (AVMs)}, each of which processes transactions of one or more types. Variations in message traffic are catered for by the ability to specify the size of the pool of AVMs of a given type. This is equivalent to the number of concurrent invocations of that AVM type permitted. The pattern of events is shown in Figure 3. User-written procedures can be attached to 'user hooks' at critical points in the TP responder and the TP scheduler. These may perform such functions as message validation and contingency handling.
computer communications
Message
,,,TP
responder
(message logging ~
(message TP scheduler validation, selection of AVM ) AVM a pptico-f~'n code message end
~,/ TP responder
~
~
TP scheduler (message logging ) (file update joumalizing)
termintal (s)
Figure 3. Transaction processing with VME/B
come by the restoration of an old copy and 'roll-forward' from the journal. File updates made during the course of one TP transaction are not applied until the end of the process; they are held in the meantime in virtual storage as 'delayed updates'. Records subject to delayed update are locked against concurrent access, and are released when the updates have been applied. File recovery from an application process failure is thus very simple, in that modifications not yet applied must simply be discarded. However, the technique has drawbacks for long processes. The virtual store can become overfull with delayed updates, unless frequent pauses, at which updates can be consolidated, are made. These pauses must coincide with Checkpoints for recovery purposes. However, Checkpoint has significant processing overheads that mitigate against its too frequent use.
TP recovery Run-time errors within AVMs are trapped and processed by a contingency handler. If the error has occurred before the end of the application process, the action of the transaction upon Recman (record manager) files is reversed by the update being discarded. If the error was transient (e.g. a system deadlock), the transaction is restarted automatically. If an AVM type commits a predetermined number of errors of a given type, further use of the transaction codes that it services is automatically inhibited. AVM failures after the application-code end can be 'recovered forwards', because file updates and output messages have been logged and can be automatically reapplied or retransmitted. A complete failure of the TP service and/or VME/B is treated upon resumption as a collection of AVM failures.
Multiphase transactions Tasks involving several stages of dialogue between the terminal user and the AVMs are supported. Interim data can be stored between phases as 'partial results' on the 'transaction slot file'. No provision exists for maintaining database locks between phases, however.
Data management The choice lies between the 'record manager' (known as Recman), and the integrated database management system (IDMS). Recman supports a conventional range of serial, keyedsequential, indexed-sequential and hashed random organizations. Recman files are described in the file data description language (DDL), which is closely based on a subset of the Codasyl DDL proposals. Record subsetting using a DDL for virtual files (VDDL) is specified, but is not yet available. The IDMS is derived directly from the Codasyl-based DBMS of the same name which runs on IBM equipment, and which is marketed by the US-based Cullinane Corp.
Recman recovery File alterations are journalized as 'after looks' at the record level as they are made. General file corruption is over-
vol I no 5 october 1978
I DMS recovery Area locking Alterations to database pages are applied as they are made. Copies of the page before and after the change are journalized as 'before' and 'after' looks. Updating run units may only lock complete database realms against concurrent access. Any run-unit failure requires manual invocation of rollback or rollforward utilities. High concurrency updating is not feasible because of the size of the locked units (realms).
High-performance options Pagelocking The B150 release of IDMS on 2900 under VME/B supports success units through the application of page locks to all data accessed; there locks are maintained through to an IDMS Finish or transaction end. Before and after page looks are journalized to tape, thus enabling roll-forward and rollback recovery techniques to be used. 'Local' failures (i.e. those involving a failure of only one run unit) generally involve an automation rollback of the database to the start of the current success unit, and presentation to the failing program of a choice of 'abort' or 'resume under control'. System failures involve interaction between the DBMS and the TP recovery file on resumption so that •
the state of each incomplete transaction can be identified so that recovery forward (with the sending of an already logged message) or rollback may be selected in each case, • all user access is locked out until database consistency has been reinstated. 2900 1DMS offers batch programs the choice of area-level locking or success units compatible with the TP interface. Full batch checkpointing facilities, interfacing with the success units, are schedules for future release. Deadlock on the page-locking tables is detected by an algorithm, and one of the participating run units is selected to be rolled back and (optionally) reinstated. Slower database recovery from more widespread failure is supported by dump/ restore and roll forward from a journal.
251
DMSl100 AND TIP ON UNIVAC 11/00 SERIES
IMS2 AND IMS/VS ON IBM 370
DMS1100, which is an example of a Codasyl-based DBMS, applies page locks to all data accessed by means of KEEP and FREE commands. Success units are realized through the retention of locks until FREE or CLOSE. Before and/or after image copies of pages may be journalized on an audit-trail tape, and optionally a 'quick-before-looks' file on disc may be used for rapid recovery from run-unit failure. Concurrency can be controlled at the page and area level. Deadlocks are resolved by rolling back the run unit of lower priority using the temporary file of before images. Recovery points are logged at installation defined points or on the request of a run unit. A utility program is used to save and restore files or areas of the database. This may be used for file reconstruction from a duplicate copy following a disc failure and also to create dump backup copies of database files for recovery purposes.
DL/1, the IMS data description language, is supported by a general-purpose DBMS that uses in turn a variety of physical access methods and data structures that particularly provide the DBA with facilities to enable him to control the access and mode of processing of data (read, insert, update, delete). It provides for data independence and the protection of the integrity of the data against hardware malfunction by a comprehensive set of logging backout and recovery facilities. DL/1 interfaces with application programs written in PL/1, COBOL or Assembler, and it has an online interface with IMS/VS-DC and CICS/VS. (IMS2 and IMS/DC will eventually be made obsolete by IMS/VS). No protection is provided for shared access in batch mode, however. DMS/VS schedules all message-processing regions. Each terminal input is a transaction, and is prefixed by a transaction code defining the PSB (program) to process it. Conversations are supported via a scratch-pad area. IMS/VS has HOLD and FREE commands and applies locks at the segment occurrence level. Deadlock resolution is automatically achieved by roll back. IMS provides good recovery facilities from database/system/program failure that include
IDS/II AND TDS ON HONEYWELL 6000 SERIES/ LEVEL 66 IDS/II is a standard implementation of the Codasyl recommendations. It is integrated with the COBOL 74 language processor (but is written in PL/1) and it has high-level interfaces to query language processors and report writers. File access concurrency is controlled by locking/interlocking at the page level, but no KEEP/FREE commands are available. Thus, as with the ICL 2900, page locks are retained until transaction end. A journal is maintained automatically during update program runs, and provides a means of logging before and after images and other information necessary to retrace the file in the event of program or system error. Utility service routines are available in I DS for file dumping and reloading, restructuring and reorganization and for dynamic and static statistics collection. The TPR (transaction processing routine) in TDS is reentrant (i.e. may be multithreaded), and TDS has its own automatic recovery restart facilities which roll back to the start of a transaction phase. There is no concurrent access control, however, between TDS and batch jobs.
IDMS ON IBM 370 IDMS on the IBM 370 is a Codasyl-based product marketed and supported by a software house (the Cullinane Corp.). The product was the source of the ICL IDMS previously described. Although, in the developed ICL form, success units and integrated TP recovery are supported, this is not so in the currently available release of the product for IBM equipment. Locking against concurrent access is at the area level only, hence high-throughput TP updating is not feasible, and recovery by roll back or roll forward must be invoked manually. Thus recovery is open to manual error, and the MTTR cannot be reduced below several minutes. Given that rapid integrated TP recovery is not attempted, interfacing to a TP monitor is relatively simple, and depends on the characteristics of the individual product. It is achieved in IDMS through a centralized monitoring module (CAMP) and a generalized communications interface (GCI). Interfaces are offered by several common TP monitors, although recently a preferred product, IDMS/DC, has been offered by Cullinane.
252
• dumps • logging routines • a charge accumulation program to compress log tapes and sort images • recovery programs using a dump, compressed log, standard log, checkpoint and restart and automatic recovery. Batch message programs are synchronized with online programs. TP queues and database are recovered in synchronization.
ACKNOWLEDGEMENT The paper is based on part of the material for 'Integrity and recovery in real-time systems', a course given by Infotech International Ltd for those responsible for systems feasibility studies, systems design, and software and hardware selection for real-time systems who already have an appreciation of transaction-processing and real-time system techniques. The course analyses the requirements for reliability, availability, data integrity and recovery in real time systems, and examines the complementary roles of hardware and applications and systems software in the design of fault-tolerant systems. It is also concerned with evaluating related design tradeoffs across the whole spectrum of real-time systems, from high-traffic high-data-volume fast-recovery mainframe systems to low-volume slow-recovery, but still high-integrity, dedicated systems using minicomputers or microcomputers. It is designed to give the participant an understanding of the principles of fault-tolerant systems, and to show him how to ensure that the requirements are specified so that design options are prepared and reviewed at the earliest stage of a project. The next time the course is held will be in London on 14-16 November, 1978. For details of this and other realtime/data-communications courses, such as 'System design for teleprocessing', 'Software for teleprocessing and data communications' and 'Advanced networks and communications systems', please contact Infotech.
computer communications