SYSTEM SAFETY ENGINEERING
Question #1
What are some possible negative outcomes of having replaced older electromechanical controls with computerized and digitalized controls?
Before the era of computerized and digital controls, electromechanical controls were widely used. The switch from electromechanical controls to the computer and digital controls were motivated by the advantages the latter controls offered. For example, computerized and digitized controls not only offer a greater flexibility but also the ability to reprogram them dynamically when there is a need to change utility operation characteristics (Willis, 2004). Despite their disadvantages, computerized and digitalized controls have been found to play a key role in causing accidents (Leveson, 2011). Leveson pointed out that there have been assumptions pertaining safety of computerized and digital controls. One example is the assumption that just because they are highly reliable, they are safe. However, contrary to the popular belief, computerized and digitalized controls sometimes cause new types of accidents. Furthermore, their use has been found to cause challenges when investigating accidents and also when finding a means to prevent them. According to Leveson (2011), one of the sources of problems witnessed in computerized and digitalized controls is the communication gap between the engineer who designs the machine and the software developer who comes up with a programmable control. The communication gap has been found to cause requirement flaws because the requirements may either be based on wrong or incomplete assumptions. In the case of the accident which involved F-18, it was found out that a mechanical failure which occurred in the aircraft caused the inputs to arrive faster than is normally expected, and this overwhelmed the software. Leveson argued that simply attempting to make the software “correct” to operate well and implement the requirements does not make it necessarily safer.
Question #2
Explain in detail the three basic constructs that underlie STAMP.
There is three underpinning concepts of the System-Theoretic Accident Model and Process (STAMP) model of accident causation. The concepts include the process models, hierarchical safety control structure, and safety constraints (Leveson, 2011). Underlying these three concepts are basic system theory concepts. In the STAMP model, systems are considered as interrelated components and maintained in equilibrium by the use of feedback control loops.
Safety in a particular system is said to have been achieved when the desired constraints on the components and behavior of the system are satisfied. As changes and adaptations of the system take place with time, the initial design of the system must not only enforce desired constraints on the behavior of the system to ensure safe operations, but the system must keep on enforcing the constraints. Leveson (2011) contend that accidents are a product of flawed processes which involve interactions among the people, organizational and societal structures, physical systems components, and engineering activities that lead to the violation of the system safety constraints. Leveson further argued that rather than define safety management in light of preventing failure of components, it is instead defined as creating a safety control structure that will guide the behavioral safety constraints and make sure that its effectiveness continues as changes and adaptations take place over time. To understand more about accidents, a model such as STAMP helps in identifying the safe constraints that were overridden and finding out why the controls were not sufficient in enforcing them. Besides allowing for consideration of more causes of accidents than simple component failures, the STAMP model also facilitate more complex analysis of failures as well as component failure accidents (Leveson, 2011).
Question #3
Consider that the analysis findings revealed that there were no explicit or written procedures regarding the control of helicopters on AWACS operations. Describe and explain the reason for this flawed control.
The analysis findings revealed that there were some factors which contributed to the flawed control of the helicopters on AWCS operations. Some of the flaws identified include inadequate control algorithms, inaccurate and inconsistent mental models, coordination among multiple controllers, feedback from the controlled process, and time lags (Leveson, 2011). Adequate control algorithms are very crucial to safety. The absence of explicit or written procedures suggests that the radio contact with the helicopters was going to be frequently lost. The worst part of it is because there were no procedures or guidelines to follow when the incident occurred. In contrast, the operation manual for AWCS stated that because helicopters are a high-interest track, they had to be hard copied every two minutes in Iraq and every five minutes in Turkey. The coordinates were to be entered into a special logbook since helicopters’ radar contact is lost and the radar symbology might be suspended. There were too many helicopter missions which were flying from Diyarbakir to Zakhu and back. As a result, the controllers did not see the need to hand them off and also to switch them over to the TAOR frequency for just a few minutes. Established practice suggests that the helicopters need to be under the control of the en route controller rather than handing them over to the TAOR controller. In this case, the established practice seemed to be safe until such a point the behavior of the helicopter differed from normal. This means when the helicopter overstayed in the TAOR and stayed a few miles within the boundaries. One other factor which complicated the accident arose from the universal misunderstanding of the responsibility of each controller on Army helicopter’s tracking. The reluctance of the AWCS crew in enforcing the rules contributed to the inability of AWCS to correct improper Mode I code of the Eagle Flight. As a result, the controllers were discouraged from pushing pilots flying the helicopter to the TAOR frequency upon entry into Iraq because of their reluctance to say more than was necessary.
Question #1
Describe the relationship between hazards and system boundaries. Please provide at least one scenario to support your explanation.
The relationship between hazards and system boundaries is found by the definition of what constitute a hazard. The location of system boundaries influences what constitute a hazard. A system is defined as an abstraction, and the system’s boundaries can be drawn anywhere where the person interested in defining the system wants. The location of the boundaries will determine the conditions which can be considered to be part of the hazard and also which can be considered to be part of the environment. Due to the arbitrary nature of the choice, the most useful way of defining boundaries and the hazard is to draw them in such a way that it comprise the conditions associated with the accident over which the designer of the system exerts some control (Leveson, 2011). This implies that if the designers are expected to create systems that control or eliminate the hazards and therefore prevent accidents, it is a requirement that the hazards have to be within the design space. According to Leveson, the relationship between hazards and boundaries can be explained using an air traffic control system. If the definition of an accident is said to occur when aircraft collides, then the possible hazard is the violation of minimum space between the aircraft. Although the designer of an air traffic control system capable of avoiding collision has control over the separation between airborne aircraft, he has no control over other factors which can influence whether two aircraft in close contact will collide. For example, items he lacks control of including weather conditions or visibility, and attentiveness or state of the mind of the pilots. Such other factors are outside the control of the system designer and are part of other system components such as the control center responsible for directing flights during poor weather conditions or other system components responsible for aircraft design and selection and training of pilots (Leveson, 2011).
Question #2
Explain in detail the difference between a system hazard and a system safety constraint. Please provide at least one scenario to support your explanation.
A system hazard has been defined as a set of conditions or system state that, in conjunction with a set of extreme environmental conditions, can potentially lead to a loss or an accident. A system safety constraint or a safety system requirement is a feature which is incorporated into the system so that it prevents certain hazards from occurring (Leveson, 2011). Constraints are normally used to trade off analyses or guide the system design. During the process of engineering decomposition, each component is allocated refined system-level constraints. Table 1, shows a typical scenario of a design constraint that might be produced from automated aircraft door hazards. It should be noted that the third constraint is in conflict with the last constraint. The resolution of such a constraint is a critical part of the system design process. Identification of such type of conflicting constraints early in the design process will lead to the generation of better solutions. However, choices might be limited later on when it might not be practical or possible to change the early decisions. The safety requirements, as well as the safety constraints, undergo further refinement and expansion as the design proceeds, and the design decisions are made. A safety constraint in the TCAS, for example, must not cause any interference with an air traffic control system based at the ground. Later on in the process, this constraint has to be refined into more comprehensive constraints on the various ways this interference might take place.
Question #3
Describe how high-level system hazards are identified and ultimately accommodated for within the system safety process using the STAMP model. Please provide at least one scenario to support your explanation.
Rather than begin the identification of the hazards from too large a list, the best way is to identify them using a definition of either a loss or an accident along with safety criteria imposed by a regulatory agency or industry associations and practices. For example, hazards are listed then at the end of a hazard, a square bracket with a number indicating the accident associated with the hazard. The high-level system hazards might be derived from accidents defined for a particular institution or organization's processes. During the process of system design, the high-level hazards will be refined following a consideration of other design alternatives. Unsafe behavior detected at the system level can be integrated into the hazardous behaviors at either the component or subsystem level. The reverse process, however, is impossible. This implies it is impossible to identify level system hazards by just looking at the behavior of each component because safety is not a component property but a system property (Leveson, 2011). However, Leveson pointed out that there are no tools which can be employed to identify hazards. To her, it all depends on the domain expertise as well as subjective evaluation of those tasked with constructing the system. However, she further went to say that the good side of it is that identification of hazards is not a complex process. With regards to hazards, Leveson (2011) also pointed out that there are no right or wrong hazards. It is only the set of hazards that the decision makers find it important to avoid.
In the STAMP model, systems are regarded as interrelated components and sustained in a position of equilibrium by the use of feedback control loops. Once high-level hazards are identified, they are analyzed to see their impact on the state of equilibrium of the system. Safety in such a system is achieved only when the desired constraints on the components and behavior of the system are satisfied. The stakeholders need to agree and prioritize hazards. The idea is to avoid hazards which are deemed to cause huge losses or serious accidents. Changes are then made to the system to make it safer. As changes and adaptations take place with time, the initial design of the system must enforce desired constraints on the behavior of the system to ensure safe operations. One scenario is the identification of hazards during TCAS design. Another scenario is the identification of the hazards likely to take place in a nuclear reactor. While some hazards are extreme, others are mild. However, the US Department of Defense requires that producers of nuclear weapons must always consider four hazards. This is a typical example of how institutions collaborate with other stakeholders to prioritize hazards.
References
Leveson, N.G. (2011). Engineering a Safer World: Systems thinking applied to safety. Massachusetts: MIT Press.
Willis, H.L. (2004). Power Distribution Planning. New York, NY: Marcel Dekker.