ISA Transactions 32 (1993) 375-380 Elsevier
375
Process computer audits T o m Lagana Hercules Incorporated, Wilmington, DE 19894, USA
Even though a process computer system is designed with the best technology, the safety of the system can be compromised in its installation, operation, and management. For this reason, process computer audits should be performed. This paper concentrates on the items a process computer auditor should inspect when performing an audit. The areas included in the audit are fire protection, the environment, electrical power, ergonomics, fail-safety,alarms, safety interlock systems, security, design change control, contingency plans, maintenance, training, operations, and documentation.
Introduction A process computer audit should ensure that the minimum safe automation requirements for all new or upgraded automated process control systems are implemented. Specifically, this includes the planning, design, construction, operations, and maintenance of process control systems used for the automation of hazardous operations. The purpose of auditing process computers is to avoid risk to h u m a n life and the environment, damage to major equipment, loss of production, and loss of business. It is necessary that policies and procedures be in place to ensure avoiding these risks. Process hazard levels
The requirements for safe automation of processes depend on their associated hazards. These requirements should be determined by a hazards analysis that is reviewed and approved by an appropriate plant committee. M a n a g e m e n t ' s responsibility includes ensuring that the policies and procedures are adhered to and that routine audits take place. Each requirement should be
classified according to the process hazard levels. For example: High hazard: Loss of process control has a high probability of resulting in a catastrophic incident (i.e., fire, explosion, personnel injury, or environmental release). Medium hazard: Loss of process control has a high probability of resulting in major equipment damage or production loss. Low hazard: Loss of process control has a high probability of resulting in minor equipment damage or production loss. Policies and procedures should be consistent in their terminology and definition of t e r m s - - f o r example, the word "should", meaning that the stated criteria are advisory; and the word "shall", that the stated criteria are mandatory. The words " m u s t " and "will" used in conjunction with "should" and "shall" may lead to confusion. For the purposes of this paper, all recommendations are stated with the word "should".
Fire protection
Correspondence to: Tom Lagana, PE, Senior Staff Engineer,
Hercules Incorporated, Hercules Plaza, Wilmington, DE 19894, USA.
Fire detection and automatic suppression should be provided to safeguard all process control systems. This includes i n p u t / o u t p u t ( I / O )
0019-0578/93/$06.00 © 1993 - Elsevier Science Publishers B.V. All rights reserved
376
T. Lagana / Process computer audits
cabinets and operators' consoles distributed throughout the process in remote locations. Fire detection systems should alarm locally and in an occupied remote location. Power from all equipment should be removed before the suppression system activates. Smoke detectors that alarm locally or in a remotely staffed area should be installed in all control rooms, electrical panel rooms, and adjoining rooms where significant potential of fire exists. Control room overhead and under-floor areas containing flammable materials should have automatic fire-detection and fire-suppression systems. Process control rooms should have manual emergency power-down controls. This includes control cabinet power, UPS, heating, ventilating, and air-conditioning systems. In air systems, dampers should close when a fire is detected. The automatic shutdown of process control power should be considered when fire or smoke is detected. In systems that have emergency power-down controls, the controls should be located at control room exits a n d / o r at other easily accessible remote locations. Process control systems should have a compatible, portable fire extinguisher located nearby. Cable coverings should be of the high-burn-resistant type. All holes or cracks in process control room fire-rated walls should be sealed with an approved "fire stop" material. Environment Factors such as temperature, humidity, vibration, and air quality should be considered to ensure the safe and reliable operation of process control equipment. If a potential for damage exists from overhead pipelines, adequate protection should be provided for process control equipment. Process control electronics should not be located in or near areas with a corrosive atmosphere. When t h i s cannot be avoided, components should be suitably protected. Control rooms and cabinets designed to be pressurized or purged should be in compliance with appropriate National Electrical Code (NEC) requirements. The
only acceptable backup source of instrument air is compressed air (not nitrogen or inert gas). Electrical power Process control installations should include an uninterruptible power supply (UPS) system. If there are redundant power supplies and one fails, the remaining ones should be able to keep the process operating or provide the necessary power to enable an orderly and safe shutdown. Control systems that are maintained on backup power should be designed to provide for an orderly restart when power is restored. Incoming control system power should pass through a dedicated power panel and be properly labeled. Isolation of the control system power and signal wiring should be provided to minimize interference from spurious energies. Power and instrumentation cables, cable trays, and conduit installations should be protected from relief valve discharges, weather, vibration, fire, heat, moisture, and acid. Protection devices should be installed to protect electronic circuitry against lightning, power surges, and power spikes. Computers and other sensitive equipment should have a single-point grounding system in accordance with the vendor's recommendations, provided it does not violate the NEC. (see N E C 250-26 c.) L e a d - a c i d batteries that are not sealed should be located in a ventilated area in a separate room from the process control equipment. Barricades or cages should be constructed around emergency power battery racks to protect them from being shorted and for personnel protection. Relay coils connected to electronic control equipment should be equipped with surge-suppression devices to protect the electronic controls when the relay is de-energized. Ergonomics User-friendly displays, screens, and keyboards should be used to minimize human error. Colorcoding on h u m a n / m a c h i n e interfaces should be consistent. A method should be used to differentiate among various alarm priorities (i.e., sound, color, print, flashing). Alarms should readily alert
T. Lagana / Process computer audits
the operator to unsafe conditions, no matter what graphic screen is being viewed. Measurements on graphic displays should indicate the engineering units. All control switches and keyboard keys should be clearly labeled to indicate their function. The number of keystrokes necessary for the operator to move from one screen to another should be minimized. Screen update times and change times should be a factor in the original selection of the control system. The height of the operators' screens, keyboards, and chairs and the placement of windows and lights should be taken into consideration. Nonglare lighting levels of 50 foot-candles at a 36-inch elevation are recommended in control rooms where cathode ray tubes (CRTs) are in use.
Fail-safe Process control systems should be designed to fail in a process safe state. Failure states of process control systems should be tested and documented before initial startup and following modifications. Programmable electronic systems should employ an external watchdog timer and alarm to detect system faults.
Alarms Alarms should be limited to only those necessary for a smooth, safe operation. Malfunction alarms should be used to monitor UPS systems, redundant power supplies, operating processors, and backup processors. In the case of redundant processors, the backup processor should provide a bumpless switchover in the event of a primary processor failure. Programmable electronic systems should document when alarms were triggered, acknowledged, and cleared. A method of annunciator testing should be provided. Annunciators should have a lock-in feature to continue the alarm state until acknowledged. Control systems should be designed with smart alarms or special help screens to indicate correct operator action for critical conditions. Critical
377
final control elements should be monitored for proper actuation.
Safety interlock systems Independent interlocks should be provided to protect human safety, the environment, and major equipment, and should be approved via a hazards analysis. A written control description including interlocks, emergency shutdown systems, and alarm systems should be reviewed and approved by operations, process technology, engineering, and safety personnel, before, during, and after design and whenever modifications to the design are proposed. Interlocks that are supplied with vendor equipment should be documented, reviewed, and approved prior to being placed in operation. A detailed, written test procedure for each safety interlock and emergency shutdown system should exist. This procedure should evaluate all components of the system and specify the method of testing. A remote and local emergency shutdown (panic) button should be available to the operator, to take the process to a safe state, in the event all else fails. A hard-wired circuit is recommended. If a programmable electronic system is used, it should be independent and include a UPS and watchdog timer. Interlock and emergency shutdown system logic programming should be accomplished without unnecessary complication. Interlocks and emergency shutdown systems should not be the same hardware that provides process control. Interlock coils should be energized during normal operation. Emergency shutdown buttons should be protected or strategically located to prevent accidental tripping. Critical interlocks and emergency shutdown systems should not reset automatically.
Security All process computers should be subject to a computer policy and audited periodically. All process control systems should be located in a secure area. Keylocks a n d / o r passwords
378
7". Lagana / Process computer audits
should prevent unauthorized personnel from modifying alarm settings, bypassing interlocks, and modifying programs or configurations. Their proper use should be documented in an operating procedure.
Design change control Software changes should be subject to approval by a plant software change control board. Once the startup phase is completed, all process control system modifications should be formally requested, approved, documented, and functionally tested. This includes program code, high-level configuration, ladder logic, or hard-wired relay logic. Formal requests and approvals should be logged and filed for future reference with the history of the control system.
Contingency plans The contingency plan should be dependent on more than one employee. The details of the program or configuration should be documented and should be understood by more than one person. A procedure should be in place to ensure that software backups are current and are made on a regular basis, with one copy stored off-site. The backups should be properly labeled with their contents and date. The process control system should include a current, written, approved contingency plan concerning failure of equipment such as: - the entire process control system; - an operator's console; - a processor; - a power supply; communication between units; an input or output card; a redundant data highway; - a redundant power source/backup. -
-
-
Maintenance A preventive maintenance plan should be in place that includes periodic testing of critical
alarms, interlocks, redundant equipment, field devices, grounds, and emergency shutdown systems. All bypassed interlocks and alarms should be tagged in a conspicuous place. The spare parts inventory should be adequate to maintain an operable system. Field instruments, inputs, and outputs should be labeled for ease of maintenance. Wherever possible, without limiting the level of desired performance or automation, controls within the plant should be standardized by manufacturer and model. All circuit boards used in permanent installations should utilize printed circuit board technology. Maintenance personnel should keep records so that recurring problems can be identified.
Training Operations, maintenance, and engineering personnel should be adequately trained to ensure safe automation.
Operations A procedure for bypassing interlocks should be established and available for review, along with the documentation supporting these activities. Levels of approval should exist for each interlock, based on its criticality. A separate log should be kept for all bypassed interlocks (both hardware and software). The shift supervisor should review and initial the log once a shift. Interlocks requiring a manual reset or override should be addressed in a written procedure. All process control operators should be familiar with emergency power-down and powerfailure procedures. Periodic drills or reviews should be conducted. The operating procedure should include the plan for a main plant power failure when the uninterruptible power supply (UPS) continues to supply power to the controls. Good housekeeping around process control equipment should be maintained. Supplies of paper and other combustibles should be kept to a minimum in process computer hardware areas. Smoking should not be permitted in control rooms. The operating procedure should list all
T. Lagana / Process computer audits
equipment (e.g., CPUs, printers, CRTs) that is connected to the UPS system. The operator should acknowledge all critical alarms. If a system has an automatic acknowledgment feature, it should not be used indiscriminately. The acknowledgment of alarms should be covered in the operating procedure. A procedure should be in place for pre-operational equipment checkout. Abnormal process control activity should be logged and investigation documented. Documentation
Complete documentation of the control system should be furnished to enable operations and maintenance to support the system. Documentation, including the process control description, should be kept current and reviewed periodically. The process control system should log changes in critical discrete devices and operator actions (e.g., setpoints, automatic-to-manual modes). Evidence of such data should be available for review. Programmable electronic system documentation should be enhanced with references and comments. Interlocks, emergency shutdown systems, and sequenced systems should be referenced on piping and instrument diagrams (P&IDs). Documentation should include narratives, block diagrams, time sequences, event trees, state diagrams, and i n p u t / o u t p u t details. A current inventory list of all process computers should be kept. The list should specify those computers considered to be critical. The inventory should also include make, model number, process area controlled, building location, and personnel assigned as functional owners, custodians, security administrators, and executors of change.
Conclusion When reviewing and auditing fire protection, the environment, electrical power, ergonomics, fail-safety, alarms, safety interlock systems, security, design change control, contingency plans, maintenance, training, operations, and documentation, we find not only potential safety hazards,
379
but also better ways to make processes safer. We have the opportunity as reviewers and auditors to see many types of systems used in various applications (often discovering ingenious programming, configuration, and procedural methods), and to pass along and refer to other users innovative ways of dealing with challenging problems. Work well done should not go unnoticed; praise and recognition are an added incentive to continued quality performance. One question I was asked when I first accepted the challenge of being a process computer auditor was, " H o w can you do a job that will make everyone hate you?". But I have experienced no animosity on the part of plant personnel in the performance of this work, thanks chiefly to my having established at the outset an atmosphere of cooperation among them. Appearing on a plantsite with a threatening manner certainly sets a bad tone. Instead, stress the purpose of the review or audit: helping each other toward the common goal of reducing risk to human life and the environment, damage to major equipment, loss of production, and loss of business. As an auditor, I have received much satisfaction in being able to help others. A plant employee once said to me, " W h e n we heard you were coming, we asked why. After you left, we realized that now there is a greater awareness of process computer risks.". I hope to leave this final thought at each plant: "Process control safety is no accident! It's up to each one of us, working together."
References [1] P.S. Adams, "Protecting the brain: Safeguarding electronic data processing systems from physical hazards", Professional Safety (July 1986) 32-38. [2] P. Andow, "Alarms, fewer is better", Chemtech (February 1986) 124-128. [3] D.A. Barclay, "Protecting process safety interlocks", Chem. Eng. Prog. (February 1988) 20-24. [4] T.G. Fisher, Alarm and Interlock Systems, Instrument Society of America, Research Triangle Park, NC, 1984. [5] S. Herb, "Designing safety into control systems", I&CS (August 1987) 53-55. [6] D. Kohan, "The design of interlocks and alarms", Chem. Eng. (February 20, 1984) 74-80.
380
T. Lagana / Process computer audits
[7] T.G. Lagana, "The process control safety checklist", Control Engineering Conference and Exposition, May 1989. [8] V.J. Maggioli, "The safety matrix", Chemical Manufacturers Association paper presentation, May 1986. [9] T.B. Rideout, W.L. Rankin, R. Shikiar and J.G. De
Steese, "Human engineering in LNG facility design", Chem. Eng. Prog. (December 1985) 40-45. [10] G. Rodriguez and P. Rivera, "A practical approach to expert systems for safety and diagnostics", InTech (July 1986) 53-57.