Are programmable controllers suitable for emergency shutdown systems?

Are programmable controllers suitable for emergency shutdown systems?

ARE PROGRAMMABLE CONTROLLERS SUITABLE FOR EMERGENCY SHUTDOWN SYSTEMB? 1 | I I Thomas G. Fisher INTRODUGT/ON Emergency shutdown sys~.~rnsare used ...

1MB Sizes 0 Downloads 55 Views

ARE PROGRAMMABLE CONTROLLERS SUITABLE FOR EMERGENCY SHUTDOWN SYSTEMB?

1

|

I I

Thomas G. Fisher

INTRODUGT/ON Emergency shutdown sys~.~rnsare used to provide for the safe operation of process units. Safe operation means safety of plant personnel, of plant equipment, and of the community surrounding the planL Magison [Ref. 1] defines safety as "freedom from danger" or "as an acceptably low risk that a system will injure workers, destroy the plant, or function in some other socially, economically, or legally unacceptable way". Emergency shutdown systems do cost money. However, according to a study by DuPont's Safety Management Services [Ref. 2], a disabling injury in the chemical industry costs an average of$18,650. This figure was based on National Safety Council data for 1985 and some data from selected client companies. They make the following statemerit: A cl~emical company with 1,000 employees could expect to have nine lost workday injuries. Given the estimated 3.7 percent Wofit margin of this industry, the company would need $4.5 million in sales to offset these injuries. Applying this perspective, safety makes good business ISSN0019-0578/90mJ0001/ID$2.q0© ISA 1990

sense. This means that more emphasis needs to be placed on the safe, reliable operation of manufacturing plants. Taunton of Allied-Signal makes the following statement

[Ref. 3]: Old levels of plant reliability are no longer acceptable. The cost of failure has gone too high. The concepts of keeping operations on-line, producing good product, eliminating catastrophic failure, and protecting our employees and communities is essential to be competitive-in all businesses today. However, safety cannot be absolute. It is always relative and relies on probability. Because there is always a probability that some activity will put life and property at risk, the following basic concept should be applied in the design of emergency shutdown systems [Ref. 1]: No single event should lie between the safe situation and a catastrophe. Some control systems being installed today may be inherendy unsafe. And they are being installed on systems, such as burner-management systems and chemical plant ISA Transactions •

Vol. 29, No. 2

1

and refmery interlock systems, where property and lives could be lost if something goes wrong. Programmable logic controllers (PLCs) are being used more and more in emergency shutdown system applications. But the answer to the question "Are Programmable Controllers Suitable for Emergency Shutdown Systems?" has to be both yes and no. The " n o " answer is true from a safety standpoint when standard off-the-shelf programmable controllers ate used. From an application standpoint, the "yes" answer is definitely true. This is because of the many advantages of PLCs in these applications; e.g., flexibility, the ability to integrate into computer-based systems, size, etc. From a safety standpoint, however, the "yes" answer is true only if special precautions are taken to ensure safe operation.

down; they signal their presence. They are called "revealed" faults or"overt" faults. These types of shutdowns can be dramatically reduced by using redundant elements within the system, because control is maintained if o n e element becomes faulty.

Fail-to-Danger (FTD) Faults Fail-to-danger faults are the most dangerous. They prevent the system from responding to hazard warnings, allowing ha7~rds to develop. They are called "unrevealed" faults or "covert" faults, because they can remain undetected until revealed by testing. Otherwise, the system will not be available when a demand arises. With testing, very high degrees of protection can be achieved, and the shorter the time interval between tests, the smaller will be the probability of two faults existing in different elements of the system.

DEFINITIONS Availability Fail-Safe

A useful definition of system availability, A, is:

Bryant provides a definition of a fail-safe control system [Ref. 4]:

A = Uptime/Total Time

A fail-safe control system is one whose outputs are de-energized (or neutralized) when a component or circuit failure occurs in the control loops associated with a particularoutput. This means that when there is a failure, the system will go to a safe position. A system that is designed to operate in the "normally energized" mode will inherently fail safe on loss of both electrical power and instrument air.

Availability is the probability that the system is working throughout the total mission time. If it is always working, the availability is 1.0. Availability is a reliability parameter that considers both mean time between failures (MTBF) and the mean downtime (MDT). MTBF is a term used to indicate the expected life of a system or piece of equipment before a failure occurs. MDT is really a summation of the mean time to diagnose the prezence of a system fault (MTDF) and :he mean time to repair (MTTR) the fault as:

Normally Energized ?'_,de With the "normally energized" mode of operation, inputs are energized under normal operation and deenergized under shutdown conditions. This means that a shutdown is initiated when the input contact opens, if the wiring breaks, a connection comes loose, etc. Output devices are also energized under normal operating conditions. They ate deenergized to go to the shutdown condition.

Normally Deenergized Mode This mode of operation is just the opposite of the norreally energized mode. Now inputs are energized to initiate a shutdown. Output devices are also energized to go to the shutdown condition. This mode of operation is normally used when continuous operation of the equipment is impor. tant and unwanted shutdowns must be minimized.

Fail-w-Safe (FTS) Faults Fail.to-safe faults result in an immediate system shut-

2

ISA Transactions • Vol. 29, No.2

MDT = MTDF + MTTR Availability c ~ also be def'med in terms of the mean time between failures ( M T B ~ and the mean downtime (rvDT) as: A = M T B F / ( M T B F + MDT)

Heron [Ref. 5] points out that MTTR can be broken down as follows: MI'FK = MTDL + MTRF + MTRO where MDDL = mean time to determine a fault location, MTFR = mean time to replace a faulted component, and MTRO = mean time to return the system to operable condition.

The MDT values can be developed from considerations of: • the location of repair personnel,

• the average repairman's skill level,

Redundant systems have individually specified duplicale components and manuai er automated means for detecting failures and switching to backup devices. Packaged fault tolerant modules have internally redundant parallel components and integral logic for identifying and bypassing faults without affectiag the output.

• the ease of diagnosis of the system fault, and • the accessibility of spares. MDT values can then be combined with the vendor's • quoted MTBF numerics to produce availability values for a module. For FTS faults, MTDF = 0, because the fault is selfrevealing. Thus,

Fault tolerant and redundant systems are used for cases as noted below: • System availability is critical. The nuclear industry has obvious requirements for high availability. Bad outputs cannot be tolerated. This might be the case for certain chemical reactions that can become violent if they are not controlled correctly.

A = MTBF/(MTBF + M T I ~ ) For FTD faults, MTDF is most important and often determines the overall availability, because this term is usually much larger than the MTIR. Therefore: A = MTBF/(MTBF + MTDF) A definite distinction must be made between "availability" and "fail-safe". Availability is calculated based on any failures that require the system to shut down. However, no distinction is made between failures that cause the system to go to a safe stale (FTS faults) and failures that cause the system to go to a dangerous state (FTD faults). Availability is just a measure of how often a system fails. A failsafe system fails in a known predetermined manner, to a safe stale. High availability is important because the system is less likely to fail, to a safe stale or a dangerous state. Fail-safe is important so that failures that do occur don't become disasters. Some systems that maintain availability do so by sacrificing their ability to operate safely. The desire in most emergency shutdown systems is to have a system that is both fail-safe and has high availability. This means that the emergency shutdown system is extremely reliable to protect the environment, personnel, and equipment. It also avoids false trips and resulting economic penalties. Redundant and Fault Tolerant Systems Redundant and fault tolerant systems can be used to make systems more fail-safe and/or increase the availability. But some people use the terms "redundancy" and "fault tolerant" inlerchangeably. Distinctions can and should be made [Ref. 6] as noted below:

Continuous operation is required. This may be a large compressor system in a refinery where there is a large cost penalty because of production losses if the equipment shuts down. Normal practice in these types of applications has been to use normally de-energized systems. However, then the system is not fail-safe. The system is installed in a remote location where maintenance is a problem. This may be because the equipment is unattended, e.g., electrical substation control or gas pipeline control. It may also be because the equipment is installed in an undeveloped country where skilled personnel are scarce.

FAIL-SAFE ASPECTS OF PROGRAMMABLE CONTROLLERS The equipment used in the emergency shutdown system should fail in a safe direction. If the emergency shutdown sys-:em uses standard, off-the-shelf programmable controllers, it is inherently unsafe. This is because of the potential for internal PLC failures and failures in triacs or SCRs that are used in output modules to energize field equipment. A PLC failure can cause a complete control system shutdown. PLCs can also fail in an undeterminable mode. Programmable controllers are also prone to internal failures that can affect the scan cycle. It is very important that a PLC failure be detected and that the proper shutdown seq,ence be performed. Tfiacs and SCRs are inherently NOT fail-safe. They do not fail in a predictable mode; they can fail in the ON condition. This can cause output devices to become energized even though the output is supposed to be in the OFF condition. Special precautions must be taken when using such devices. ISA Transactions • Vol. 29, No. 2

3

tem power up) to check for errors. Checksum systems are more efficient than parity checking.

Some cures have been developed that some feel get around the inherently unsafe characteristics of solid-state logic devices; for example, (l) Use feedback monitoring loops from the field devices. This feedback principle relies on a detection or comparison circuit to either alarm or shut down the system when there is a discrepancy. And the detection circuit itselfcan fail unless it is fail-safe. But feedback monitoring circuits are usually composed of more solid-state devices, which can make the system more complex and less reliable, and it does not solve the fall-safe problem. This does not mean that output feedback should not be used. It is a very desirable feature on most emergency shutdown systems. It just means that feedback in itself does not improve the fail-safe features of the solid-state device. (2) Use electromechanical relays for the output switching device. This also does not make the system failsafe. When relays are used as output switches with solid-state systems, they are usually driven by a solid-state driver, such as = u,,,~. These solid-state devices can fail so the relay becomes energized.

Even with the above checking features, a problem can arise if these checking functions are built into the software. They may then be subject to similar types of errors. A recent article [Ref. 8] states that these types of self-tests will only detect 70 to 80% of all faults. One solution is to build an external watchdog timer circuit. This watchdog timer monitors an output or a number of selected outputs to make sure they are constantly being scanned. If not, then that same watchdog timer can be used to effect an orderly shutdown of the system. An example circuit, shown in Figure 1, uses a motion detector. A PLC output is programmed to cycle ON and OFF periodically. A time delay is preset into the motion detector. The motion detector output is energized as long as this output toggles ON and OFF within the preset time. If the PLC output remains in the ON or OFF state and does not cycle, the motion detector output de-energizes. This output can be used to turn off power to critical outputs and/or sound an alarm.

II I

I

PLC Logic System Most programmable contrc,aer manufacturers build faultdiagnostic systems into their PLCs. This is especially true of newer microprocessor-ba;ed systems. Some methods [Ref. 7] that are incorporated are given below:

(1) Watchdog timer. This is basically a device that checks to make sure that the PLC is going through its normal scan. It is usually located at the end of the program. If it is not serviced during each scan, the PLC can be shut down according to a predefined program. (2) Parity checks. Parity bits are added to each word and are constantly checked to see if a memory., location fails or if an error has occurred in the program.

(3) Checksum. With this technique, the corresponding bits in a word are added from column to column digitally without carry. This forms a word that uniquely represents that particular word. A bit change in any word will change the checksum. The PLC constantly computes the checksum and compares it with the base checksum (usually determined on sys-

4

ISA Transactions • Voi. 29, No.2

I L:

!

J

L~

1.1:101~1~1 I

I

I I

t

Y TO PC output (lnDut Dulse)

Contacts yeom--~ ICR NtreO to c r l t l C 8 | OUtDutS 8nO/or annunciator

Figure 1. External Watchdog Timer

This watchdog circuit should be set up .so that it checks to make sure that input devices are working, the PLC scan is working, logic is actually being solved within the PLC, and the output devices are working. In systems where inputs and outputs are located in racks, it may be necessary to utilize inputs and outputs from each rack in the watchdog circuit to prove communication between the central processing unit and the I/0 devices. In PLCs where memory is

added by adding additional memory cards, it m y also be nec_ _essary to make sure that the watchdog circuit is solved in all parts of tbe memory.

RESET buUon has boca ~ It is good Inctice to batda manually oSxnted ou-off se2ector s w i ~ between tbe output interface device and the final output d e v ~ so that the tirol output device can be r e , n e d to a safe pe6itiou if the PLC fails.

Output Devices Many output interface devices use triacs to switch the load. Triacs are very sensitive to overvoltag© damage. Surge suppressors should be installed across tbe triac in any case where a mechanical switch is installed in series with the load or in parallel with the output module as shown by Figure 2. Triacs also have a minimum load requirement for the uriac to turn ON. Some loads azc not high enough to supply this minimum load, and an artificial load must be sl.~)pfied. This problem can occur when a P I ~ output is direcdy driving a solid-state input device on another PLC or computer.

PSMH-I Reset

0

I

0 l~a

IC;Z

II

-

I In



xyl

Out

Logzc $ystes

L2

suppressor

,

Figure 3. Hard-Wired Backup on Critical Shutdown

-

PC output NechanZca) Contact zn Serzes wzth LOafl

Suppressor

-

I

I

(]) Allen-BradleyO Protected Output Module. This

PC output

----t

i ~

Look at all the outputs to see what happens to these outputs when a fault is detected in the PLC. Very often the outputs are turned OFF when a fault is detected. But not all outputs are safe when turned OFF. It may be desirable for the PLC to maintain the "last state" on a fanlL As previously mentioned, when triacs fail, they often fail in the ON slate. Some PLC manufacmre~ have introduced output modules to try to make these outputs more fail-safe. Two examples are given below:

..°ran,., Contactan Parallel . l t h Output

module incorporates a protection circuit that d e ~ a shorted triac and trips a fuse in the output circuit that turns the output OFF. (S¢¢ Figure 4.)

Figure 2. Surge Suppressors for Triacs

For extremely hazardous ~ituafions, critical shutdown devices should be hard-wired externally to the solid-state logic system. For example, an extremely critical high pressure switch might be wired direcdy to a reaction feed valve. This switch is hard-wired in this manner and wired into the logic system for alarm and lockout purposes using an interposing relay as indicated by Figure 3. Lockout means that once the switch has tripped and closed the valve, the valve will not reopen until the switch has reset and a

o

~t:ut

Figure 4. AHen-Bradley~ Protected Output Module

ISA Transactions •

Vol. 29, No. 2

5

(2) Texas Instruments@ Redundant Output Module. In this module, each output circuit is backed up by an identical secondary circuit. If the module detects a shoa or open condition or a half-wave triac failure, it switches to the secondary output. If a failure is detected in the secondary output, the system disabics all outputs. (See Figure 5.)

Figure 5. Texas InstrumeatsO Redundant Output Module

Input Devices A major problem can occur with input interface devices. An input voltage of 60 to 70 volts is enough to turn ON most 115-V ac input devices. A field signal wire to an input can easily pick up this much induced voltage in long runs of wiring when it is installed in conduit or tray cable with other I/O. Induced current is not a problem with elcctromcchanicai relays, since the current flow required to pick up the coil is usually more than the current that can be supported by the induced voltage. With some high impedance input devices used on some PLCs, the current draw of an input can be as low as 9 or 10 mA. The result is that inputs can get turned ON by the induced voltage even when the field contact is open. One solution is to install a resistor between the input and ground that is sized to achieve the required current flow. Remember that input switches, e.g., limit switches, pressure switches, level switches, etc., are not all normally open (N.O.) devices. Use N.O. or N.C. (normally closed) as needed for the greatest safety. Decide which sv,i!,'hes cover the most unsafe conditions and make them normally closed. The most common failures are either a broken wire or contact that fails to close. Ifa wire or switch fails, the PLC will sense an unsafe condition. In extremely critical areas where safety is of extreme importance and both positions of a cylinder or other" device are potentially unsafe, wire the N.O. and N.C. contacts to two separate inputs. Then require that both of these contacts be in their correct positions 6

ISA Transactions . Vol. 29, No.2

before allowing the machine to proceed. In particular, STOP circuits must be wired using normally closed push buttons.

Program Design Problems can occur when changes are made to a program while the machine is running; e.g., changing the preset of a timer. On-line programming is especially dangerous, although it is a very desirable feature during checkout. Some type of keylock device should Ix) installed to prevent online program changes by maintenance and operating personnel. The ease of programming also makes it possible for unauthorized personnel to change programs. Allowing maintenance and operating personnel access to a PLC can cause problems. Even though they are skilled in their particular job area, they may iack the knowledge to program a PLC safely. Even if they know how to program the PLC, they may not know the procedures followed in your planL The use of PROM or EPROM memory for program storage offers the only effective protection against unauthorized tampering. Many PLCs allow inputs and outputs to be forced ON or OFF. This means that inputs and outputs can be turned ON or OFF manually, regardless of process conditions. Forcing is a very desirable feature during checkout and maintenance of a system, but it is very easy to forget and leave inputs and outputs in a forced condition. Also, forcing I/O in one section of a program can cause disastrous results in another section of the program, i/O forcing should not be used while the system is operating. Safety can be increased by the coding of the program itself. One of the biggest problems is in the use of latched outputs, i.e., outputs that slay ON once they have been turned ON, even after a power interruption. Although latches work well for storing error codes or for batch stepping (sequencing), some problems can arise when latched coils are used to control real world outputs. A la~+ched output that comes back ON when the PLC is powered up can cause serious or dangerous machine motion to occur immediately. Outputs that cause any machine movement should be programmed with non-retentive outputs. If these circuits must bc locked in, use a holding circuit and seal-in contacts as is done with relays.

REDUNDANT SYSTEMS An example of redundancy is shown in Figure 6 [Ref. 9]. The arbitrator determines if the running controller has failed and makes an automatic switchovcr. The arbitrator could be a PLC, a computer, or other separate piece of equipment. In this case, the backup controller makes a cold start after

Monitor anO Control

Arbltrator iF Swltch Control = .

I Programmable

Programmable Controller

The two systems would then ~ independently. The communicator is a high-speed processor-to-lnoce~-or communication link. It allows the primary processor to update the memory of the secondary processor on each scan to minimize loss of data during the switchover. ~[he communicator may be a separate device or it may be two cards, one plugged into each PLC.

Contr~ 1ler

! Prolle~le cll~ltP|l l i e

I w e ~ ' • ~ m le I~all~l i l i e



|

I, Automatacally ODerateo

I

I ...... I I

1,

i I

I!

I/O CarOs

IIIII

lille

Ingut

f l e l O 0utl~ut

Figure 6. Redundant Processors

Figure 7. Redundant PLC System

failure of the first controller, or some data will probably be lost during the transfer. Bumpless transfer probably can't be expected unless the internal data storage of the backup machine is updated each scan. This is possible only when there is a high-speed communication link between the two processors. This communication is usually accomplished by installing a communication card in each PLC. There is no switch or ~bitrator. This system is commonly called a "hot backup" system by PLC manufacturers. One of the PLCs "~s the primary controller, and it controls the status of the outputs at any given moment. If a failure occurs in the primary controller, the secondary controller switches in and takes over the control function. The primary controller is taken off line. Although these systems appear to be safe, their availability can actually be less than using two separate controllers without the automatic switchover. Failures in the arbitrator a,u switch can affcct both the other PLCs. They ~,',, called common mode failures. Likewise, a failure in one of the communications cards can prevent the system from operating correctly. Any kind of interconnection between two PLCs may allow a single failure to disrupt the whole system. Figure 7 shows a redundant system with redundant I/O. The system can be configured without the communicator•

Having communication between the two processors means that each processor can now monitor the other processor's performance. Therefore, PLC software can be developed that will diagn:,~e system faults such as: • individual input or output module failure, • field input signal miscomparison between the redundant systems (the same input signal detected differently by the two systems), and • field output signal miscomparison. However, it does introduce a potential common mode failure point, and there is no guarantee that when the transfer is made, good information is avai:able in the primary controller. If not, then the system will still go down, or, worse, the backup controller will try to operate using bad dam There are two ways to wire the outputs in a system with redundant I/O: series or parallel (digital outputs). The series hookup shown in Figure 8 minimizes the danger of a triac's failing ON and affecting the output, since both uiacs would have to fail to cause a problem. This is called oneout-of-two (1oo2) voting on the output, because either output can de-energize the output device• It increases the ISA Transactions •

Vol• 29, No. 2

7

number of trips c ~ s e d by FTS faults but considerably hardens the system against FTD faults. For example, if the triac in one of the o~put modules fails in the ON position, the second output F~~dule can still open and de-energize the output device. Ho~ vet, if one of the triacs fails in the OFF position, there is nc~ way for the second output module to energize the output :~vice. The system will shut down.

(1) Sense hardware failures, annunciate the fault, and shut down the system in an orderly manner. (2) Sense noncritical failures, annunciate the fault, and provide for user logic intervention. (3) Maintain highly reliable communication with the I/O and annunciate communication failures.

FAULT TOLERANTSYSTEMS

.]

.... .

.

.

.

.

.

.

.

.

Figure

.

.

.

Series Hookup of I / 0

The parallel ho~: shown in Figure 9 minimizes the nuisance shutdowns :~m the triac's failing open, since the other controller can ~ l l energize the output. But it does increase the possibi~::~:y of an unsafe condition if the triacs fail ON. This is call~ two-out-of-two (2002) voting on the output, because both outputs have to turn OFF before the output device will be ~e-enexgized. This system practically eliminates trips cau5 ~:~by FTS faults, but it increases the hazardsofFTD fau1~ Frequent testing is required with this type of system.

Considerable emphasis is now being given to fault tolerant computer systems. If a system is fault tolerant, it can withstand an internal failure and still function. Commercially available systems for process control applications have three identical microprocessors. If one unit fails, the other two will continue to operate. Each module within the system performs critical functions simultaneously, and the results are compared to the results of the other units using a majority voting system (best two out of three). Normally all three outputs will be identical. If one unit fails, the results will be determined by the other two units because the two of them are the same and they outvote the failed unit. Two examples of fault tolerant systems are shown in Figure 10 [Ref. 11] and Figure 11 [Ref. 12]. Faults in these systems can be identified and bypassed by rejecting one value if it doesn't agree with the other two. The fault tolerant system allows first faults to be bypassed without introducing switching transients and helps eliminate nuisance trips. Separate voting modules are used to compare the outputs before p~sing commands to the process.

m ~ m

P~Im ~sm

(neret|ee FtBIe ~tgn

Figure 10. Fault Tolerant System with Independent Channels The system in Figure 10 uses three independent chan. Figure ~ Parallel Hookup of I / 0

Having tedundan~ :~ystems is useless unless the systems have excellent builg~ ~ fault diagnostics to detect errors in the systems. E~,:.h system faust be able to [Ref. 10] do the following: 8

IS/, TransactL~ns • Vol. 29, No.2

nels, each running the same logic, and each having its own input and output modules. The outputs from these three channels go to a voting network where they are voted 2003 using hard-wired electromechanicai relays. Watchdogs and automatic testing (to identify FTD faults) are incorporated in this system. A failure anywhere in one channel of this system renders that channel inoperative.

] Terminatio]ns

!roc 5sot

present. Also. the failed module caa be chmged while Iill maintaining control of the load. The ultimate system in terms of fail-safeness and availability would be achieved by using triplicatedfield sensors in conjunction with fault tolerant system and tripficated output devices. The output devices are usually not triplicated because of the high cost involved. However, in critical systems, the cost of having two output devices may be justified. These output devices can be connected in series (see Figure 13) or parallel (see Figure 14) d , . ~ J i n g on whether it is more important to stop the flow of fluid (e.g., steam to prevent overheating) or to maintain the flow of fluid (e.g., cooling water to prevent overheating) [Ref. 13]. Some advantages to be gained with a woperly designed redundant or fault tolerant system [Ref. 10] are as follows: (1) Control of the process continues in the event of a failure in one of the systems. (2) A failure of a module in one system will not affect the other system(s).

I

,,

(3) Maintenance can be performed on one system while the other system(s) is still in operation. This means that a faulty unit can be replaced while the system is

Figure 11. Triple Modular Redundant Fault Tolerant System ¢4BVm

The system in Figure 11 does the voting at two points. The inputs are voted 20o3 before they go to the controllers. If one input does not agree with the other two, it is voted out. This allows the system to identify failures as early as possible. The outputs are then voted 2o03 before they go to the field devices. This type of fault tolerant system is often called a "triple modular redundant" (TMR) architecture. A f~tult in one channel of this system doesn't mean that the whole channel is inoperative. For example, a failure in one input module will not prevent the unit from correctly processing, using the other inputs, and carrying out outputs. The final output module must also be fault toleranL This can be achieved by connecting two fail-safe outputs in parallel. Figure 12 shows a Triplex@ fail-safe output module. With this module, the load is controlled by two driver circuits, both of which must be conducting the pass current. Both drivers are tested every second to see if they have failed. Any discrepancy causes a time-out and signals a fault to the system. With two fail-safe modules strapped together (outputs in parallel), it is possible to control the load, both ON and OFF, even with a failure condition

Figure i2. TriplexO Guarded Digital Output Module

FC

FC

Figure 13. Redundant Shutdown Valves, Series Configuration ISA Transactions •

Vol. 29, No. 2

9

still running. This requires careful design of both the hardware and software. (4) Each system can be used to monitor the other(s) for hardware errors. (5) An independent system (i.e., a host computer) can be used to monitor the systems and report system errors.

CLASSIFICATION OF RISKS The emergency shutdown system is generally the backup system that takes over when an unsafe situation occurs and takes the process to a safe state. The alarm and interlock system oversees the operating plant and the process control system to make sure that a failure in either of those systems does not cause a hazardous situation. This means that inactive alarms, defective safety devices, and poor maintenance of these systems and of operating equipment cannot be tolerated. However, the user must determine what the emergency shutdown system is designed to protect; e.g., the community, plant personnel, plant equipment, etc. These can typically be divided into classes as shown below, in order of decreasing severity [Refs. 14 and 15]:

• Class I - Environmental Risks (These can affect the community and plant personnel.) • Class I I - Employee Safety Risks (such as from an explosion) • Class HI - - Major Equipment and Production Cost Risks (These typically can cause costly business interruptions•)

important, then the redundant PLC (with redundant I/O) may be a good choice• F-ach of the PLCs in this r e d u n ~ t configuration should be equipped with external watchdog timers. Fail-safe outputs should also be used. Single PLCs, even if they are equipped with external watchdog timers and fail-safe outputs, are not suitable for first-level safety systems. This is because the normal fault detection techniques, even using external watchdog timers, are not 100% effective at detecting faults (Ref. 9). Class IV and V risks can be grouped into a category called "Second-Level Safety". It is difficult to justify either the cost of fault tolerant systems or completely redundant PLCs for this level of risk. A single PLC with an external watchdog timer and fail-safe outputs may be suitable for these applications. Where redundant PLCs are used, usually only the processor (CPU) is redundanL

CONCLUSIONS It is possible to design fail-safe emergency shutdown systems using programmable controllers, but special attention must be given to these systems. A more thorough analysis is necessary with these systems than is necessary for relay logic systems. This is especially true when high availability is also a requiremenL But the advantages of PLCs over relays are so great, particularly if the logic system is complex, that it is well worth the extra time and efforL Note that this discussion considered only the programmable controller itself, along with its associated input and output modules. A complete analysis would also include the field devices, both inputs and outputs, and the system power supplies• More details on complete systems can be obtained in the papers by Martei [Ref. 14] and Maggioli [Ref• 15].

• Class iV - - Minor Equipment and Production Cost Risks . . . . . . . . . . . . . . . . . . . . .

. i I i i i i

PLC

• Class V - - Product Quality and Product Handling Risks (An example might be mixing two different products together.)

FO

• RECOIflMENDATIONS

When emergency shutdown systems are implemented, Class I, II, and Ill risks are often grouped into a category called "First-Level Safety". When the eme~rgency shutdown system must be fail-safe and have high availability, then fault tolerant systems appear to be the best choice. When the system must be fail-safe but availabillity is not so

10

ISA Transactions • Vol. 29, No.2

q

:

FO

Figure 14. Redundant Shutdown Valves, Parallel Configuration

REFERENCES 1. Magison, E. C., "Make Sure Your System is Safe," Instrument & Control Systems (I&CS), December 2. 3. 4. 5.

6.

7.

8.

(1979). "Improved Safety Performance Is Good Business," Chemical Engineering Progress (CEP), July (1987). Taunton, L. R., "Fighting Chemophobia," CEP, May (1987). Bryant, J. A., "Design of Fail-Safe Conurol Systems," Power, January (1976). Heron, R. L., "Critical Fault Control System," lresented at Florida Municipal Utilities Association 21 st Annual Engineering & Operations Workshop, FL Pierce, Florida, November 4-6 (1986). Krigman, A., "Reliability Enhancement Techniques: Can You Afford To Use Them.'?Can You Afford Not ToT' InTech, July (1985). Benedetto, J. A., "Preventing Operational Errors in Programmable Controllers," ControlEngineering, July (1981). Humphrey, J. A., "Applying Fault Tolerant System Architectures," IC&S, October (1987).

9.

10.

11.

12. 13.

14.

15.

Cherba" D. M., "'Redundancy in Programmable Controller Systems," Instmnems & ControlSystems, April (1983). Sykora, M. R., "The Design and Application of Redundant Programmable Controllers," Control Engineering, July (1982). ICS-MESD2000, "Microprocessor-Based Emergency Shutdown," Industrial Control Services, Houston, (1984). Machulda, J., "Fault Tolerant Control: You Can Afford It," lnTech, September (1985). Smith, S. E., and J. A. Humphrey, "Methods of Applying Fault-Tolerant Programmable Controllers," Source Unknown. Martel, J. T., "Design of a Safety Shutdown System," Presented at the Fifth Process Computer Users Forum of the Chemical Manufacturers Association, May (1986). Maggioli, V. J., "The Safety Matrix," Presentcd at the Fifth Process Computer Users Forum of the Chemical Manufacturers Association, May (1986).

lSA Transactions • Vol. 29, No. 2

11