By Jane Anne Morris1
In retrospect (that is, after an accident has occurred), it is often easy to look back at a situation and describe it as an accident waiting to happen. Too often we just leave it at that, perhaps hoping that someone else will figure out how we could have foreseen it.
Somebody has.
Charles Perrow has reviewed a range of technologies—including petrochemical and nuclear facilities, dams, mines, weapons and space research, aircraft and airways, DNA research, and shipping—and written an insightful and persuasive analysis of the kinds of systems in which accidents are inevitable, or “normal” in his terminology.2 Nuclear power plants are excellent examples of such systems.
According to Perrow, normal accidents occur in systems that share a few key characteristics. First, the system has many components (parts, procedures, and operators) arranged in a complex way. (That is, it’s not one long linear process where all that happens is that A leads to B leads to C, etc., and that can be stopped easily at any time.)
In such a system (with many components arranged complexly), it is obvious that many small failures—things like faulty switches, burned-out light bulbs, minor operator errors—will occur. Such failures are not expected to be catastrophic because numerous back-up and emergency response systems—also complex—are in place.
The second characteristic of a system that will, according to Perrow’s analysis, experience normal accidents, is that two or more failures (of parts, procedures, or operator judgment)—failures that may be trivial in themselves—can interact in unexpected ways. For example, part P (a light bulb on a gauge) might fail at the same time that part Q (part of a back-up system) is off-line for maintenance. The failure of part P might leave operators unaware that a problem was developing; the inactive status of part Q might deactivate the emergency system that would have (probably) either alerted operators to the problem, or shut down now-dangerous components.
By the time they see a problem, they will be unable to act appropriately…
But the problem is just beginning. For one thing, the operators may not know that anything unusual is happening. There is so much going on, that in a system with literally billions of components, they may not know that part Q is not on-line. They have no way of knowing that the light bulb in a particular gauge should be blinking “danger.” The complex system, with all of its gauges, back-up systems, and interdependent processes (for example, certain pumps automatically go on when temperature in a given area—or the gauges that show temperature—reach a pre-established threshold) continues to function and react. Until other things go “wrong,” the operators will be unaware that there is a problem. By the time they see a problem, they will be unable to act appropriately because they have no way of knowing what else has happened.
In the case of Three Mile Island, it took many months of sifting through computer data, numerous interviews, and much technical analysis before a reasonable scenario of “what happened” could be constructed. This circumstance leads to an inherent contradiction in high-risk systems: the very procedures that are necessary during normal operations are hopelessly inadequate during emergencies.
During normal operations, a centralized control team must know exactly what each operator is doing, so that one person does not do something that would interact with another component to endanger the whole system. Therefore, operator procedures must be fixed and exact. However, accidents tend to happen when events for which there are no clear procedures occur. The lack of procedures, combined with operators’ ignorance of the exact state of affairs, means that operators must take independent action based on their best guess as to what is happening.
Without suggesting that back-up systems and redundant safety features should be eliminated, Perrow notes that these measures add to the complexity of a system and decrease the likelihood of timely comprehension of a problem.
For instance, suppose that to ensure the accuracy of control panel information about a very important measurement, there are not one but two gauges measuring the amount of water in a tank. Now suppose that one shows that the tank is empty and the other that it is full. Is one gauge broken? If so, which one? Is the tank either empty or full? Perhaps it is half full, and both gauges are malfunctioning or disconnected.
Assume that a partly full tank could account for observed leakage, and that an empty tank would explain overheating. What if other gauges suggest neither leakage nor overheating? Are both of these gauges accurate, or is one or both faulty, and if so, which? And so on. Even in this oversimplified hypothetical case, the possibilities multiply rapidly when a possible doubt is introduced regarding each piece of information.
In some technologies, normal accidents are relatively frequent but limited in catastrophic potential. Wind turbines can be dangerous and have caused horrible injuries to maintenance personnel, but they are not going to spin off and decapitate thousands of people. In nuclear technologies, the catastrophic potential is immense, as we are seeing again at Fukushima. Nuclear accidents, viewed through historical or systems analysis, are a certainty. It is hubris to think otherwise.
Notes:
- This article first appeared in the Winter, 2012 edition of Synthesis/Regeneration.
- Charles Perrow’s 1984 book, Normal Accidents: Living with High-Risk Technologies (Basic Books, NY, 1984), offers a framework for evaluating and reducing the risk of accidents in various industrial technologies.