Showing posts with label System Dynamics. Show all posts
Showing posts with label System Dynamics. Show all posts

Wednesday, June 26, 2013

Dynamic Interactive Training

The words dynamic and interactive always catch our attention as they are intrinsic to our world view of nuclear safety culture learning.  Carlo Rusconi’s presentation* at the recent IAEA International Experts’ Meeting on Human and Organizational Factors in Nuclear Safety in the Light of the Accident at the Fukushima Daiichi Nuclear Power Plant in Vienna in May 2013 is the source of our interest.

While much of the training described in the presentation appeared to be oriented to the worker level and the identification of workplace type hazards and risks, it clearly has implications for supervisory and management levels as well.

In the first part of the training students are asked to identify and characterize safety risks associated with workplace images.  For each risk they assign an index based on perceived likelihood and severity.  We like the parallel to our proposed approach for scoring decisions according to safety significance and uncertainty.**

“...the second part of the course is focused on developing skills to look in depth at events that highlight the need to have a deeper and wider vision of safety, grasping the explicit and implicit connections among technological, social, human and organizational features. In a nutshell: a systemic vision.” (slide 13, emphasis added)  As part of the training students are exposed to the concepts of complexity, feedback and internal dynamics of a socio-technical system.  As the author notes, “The assessment of culture within an organization requires in-depth knowledge of its internal dynamics”.  (slide 15)

This part of the training is described as a “simulation” as it provides the opportunity for students to simulate the performance of an investigation into the causes of an actual event.  Students are organized into three groups of five persons to gain the benefit of collective analysis within each group followed by sharing of results across groups.  We see this as particularly valuable as it helps build common mental models and facilitates integration across individuals.  Last, the training session takes the student’s results and compares them to the outcomes from a panel of experts.  Again we see a distinct parallel to our concept of having senior management within the nuclear organization pre-analyze safety issues to establish reference values for safety significance, uncertainty and preferred decisions.  This provides the basis to compare trainee outcomes for the same issues and ultimately to foster alignment within the organization.

Thank you Dr. Rusconi.



*  C. Rusconi, “Interactive training: A methodology for improving Safety Culture,” IAEA International Experts’ Meeting on Human and Organizational Factors in Nuclear Safety in the Light of the Accident at the Fukushima Daiichi Nuclear Power Plant, Vienna May 21-24, 2013.

**  See our blog posts dated April 9 and June 6, 2013.  We also remind readers of Taleb’s dictate to decision makers to focus on consequences versus probability in our post dated June 18, 2013.

Thursday, December 20, 2012

The Logic of Failure by Dietrich Dörner

This book was mentioned in a nuclear safety discussion forum so we figured this is a good time to revisit Dörner's 1989 tome.* Below we provide a summary of the book followed by our assessment of how it fits into our interest in decision making and the use of simulations in training.

Dörner's work focuses on why people fail to make good decisions when faced with problems and challenges. In particular, he is interested in the psychological needs and coping mechanisms people exhibit. His primary research method is observing test subjects interact with simulation models of physical sub-worlds, e.g., a malfunctioning refrigeration unit, an African tribe of subsistence farmers and herdsmen, or a small English manufacturing city. He applies his lessons learned to real situations, e.g, the Chernobyl nuclear plant accident.

He proposes a multi-step process for improving decision making in complicated situations then describes each step in detail and the problems people can create for themselves while executing the step. These problems generally consist of tactics people adopt to preserve their sense of competence and control at the expense of successfully achieving overall objectives. Although the steps are discussed in series, he recognizes that, at any point, one may have to loop back through a previous step.

Goal setting

Goals should be concrete and specific to guide future steps. The relationships between and among goals should be specified, including dependencies, conflicts and relative importance. When people don't to do this, they can become distracted by obvious or unimportant (although potentially achievable) goals, or peripheral issues they know how to address rather than important issues that should be resolved. Facing performance failure, they may attempt to turn failure into success with doublespeak or blame unseen forces.

Formulate models and gather information

Good decision-making requires an adequate mental model of the system being studied—the variables that comprise the system and the functional relationships among them, which may include positive and negative feedback loops. The model's level of detail should be sufficient to understand the interrelationships among the variables the decision maker wants to influence. Unsuccessful test subjects were inclined to use a “reductive hypothesis,” which unreasonably reduces the model to a single key variable, or overgeneralization.

Information gathered is almost always incomplete and the decision maker has to decide when he has enough to proceed. The more successful test subjects asked more questions and made fewer decisions (then the less successful subjects) in the early time periods of the sim.

Predict and extrapolate

Once a model is formulated, the decision maker must attempt to determine how the values of variables will change over time in response to his decisions or internal system dynamics. One problem is predicting that outputs will change in a linear fashion, even as the evidence grows for a non-linear, e.g., exponential function. An exponential variable may suddenly grow dramatically then equally suddenly reverse course when the limits on growth (resources) are reached. Internal time delays mean that the effects of a decision are not visible until some time in the future. Faced with poor results, unsuccessful test subjects implement or exhibit “massive countermeasures, ad hoc hypotheses that ignore the actual data, underestimations of growth processes, panic reactions, and ineffectual frenetic activity.” (p. 152) Successful subjects made an effort to understand the system's dynamics, kept notes (history) on system performance and tried to anticipate what would happen in the future.

Plan and execute actions, check results and adjust strategy

The essence of planning is to think through the consequences of certain actions and see whether those actions will bring us closer to our desired goal.” (p. 153) Easier said than done in an environment of too many alternative courses of action and too little time. In rapidly evolving situations, it may be best to create rough plans and delegate as many implementing decisions as possible to subordinates. A major risk is thinking that planning has been so complete than the unexpected cannot occur. A related risk is the reflexive use of historically successful strategies. “As at Chernobyl, certain actions carried out frequently in the past, yielding only the positive consequences of time and effort saved and incurring no negative consequences, acquire the status of an (automatically applied) ritual and can contribute to catastrophe.” (p. 172)

In the sims, unsuccessful test subjects often exhibited “ballistic” behavior—they implemented decisions but paid no attention to, i.e, did not learn from, the results. Successful subjects watched for the effects of their decisions, made adjustments and learned from their mistakes.

Dörner identified several characteristics of people who tended to end up in a failure situation. They failed to formulate their goals, didn't recognize goal conflict or set priorities, and didn't correct their errors. (p. 185) Their ignorance of interrelationships among system variables and the longer-term repercussions of current decisions set the stage for ultimate failure.

Assessment

Dörner's insights and models have informed our thinking about human decision-making behavior in demanding, complicated situations. His use and promotion of simulation models as learning tools was one starting point for Bob Cudlin's work in developing a nuclear management training simulation program. Like Dörner, we see simulation as a powerful tool to “observe and record the background of planning, decision making, and evaluation processes that are usually hidden.” (pp. 9-10)

However, this book does not cover the entire scope of our interests. Dörner is a psychologist interested in individuals, group behavior is beyond his range. He alludes to normalization of deviance but his references appear limited to the flaunting of safety rules rather than a more pervasive process of slippage. More importantly, he does not address behavior that arises from the system itself, in particular adaptive behavior as an open system reacts to and interacts with its environment.

From our view, Dörner's suggestions may help the individual decision maker avoid common pitfalls and achieve locally optimum answers. On the downside, following Dörner's prescription might lead the decision maker to an unjustified confidence in his overall system management abilities. In a truly complex system, no one knows how the entire assemblage works. It's sobering to note that even in Dörner's closed,** relatively simple models many test subjects still had a hard time developing a reasonable mental model, and some failed completely.

This book is easy to read and Dörner's insights into the psychological traps that limit human decision making effectiveness remain useful.


* D. Dörner, The Logic of Failure: Recognizing and Avoiding Error in Complex Situations, trans. R. and R. Kimber (Reading, MA: Perseus Books, 1998). Originally published in German in 1989.

** One simulation model had an external input.

Wednesday, December 5, 2012

Drift Into Failure by Sydney Dekker

Sydney Dekker's Drift Into Failure* is a noteworthy effort to provide new insights into how accidents and other bad outcomes occur in large organizations. He begins by describing two competing world views, the essentially mechanical view of the world spawned by Newton and Descartes (among others), and a view based on complexity in socio-technical organizations and a systems approach. He shows how each world view biases the search for the “truth” behind how accidents and incidents occur.

Newtonian-Cartesian (N-C) Vision

Issac Newton and Rene Descartes were leading thinkers during the dawn of the Age of Reason. Newton used the language of mathematics to describe the world while Descartes relied on the inner process of reason. Both believed there was a single reality that could be investigated, understood and explained through careful analysis and thought—complete knowledge was possible if investigators looked long and hard enough. The assumptions and rules that started with them, and were extended by others over time, have been passed on and most of us accept them, uncritically, as common sense, the most effective way to look at the world.

The N-C world is ruled by invariant cause-and-effect; it is, in fact, a machine. If something bad happens, then there was a unique cause or set of causes. Investigators search for these broken components, which could be physical or human. It is assumed that a clear line exists between the broken part(s) and the overall behavior of the system. The explicit assumption of determinism leads to an implicit assumption of time reversibility—because system performance can be predicted from time A if we know the starting conditions and the functional relationships of all components, then we can start from a later time B (the bad outcome) and work back to the true causes. (p. 84) Root cause analysis and criminal investigations are steeped in this world view.

In this view, decision makers are expected to be rational people who “make decisions by systematically and consciously weighing all possible outcomes along all relevant criteria.” (p. 3) Bad outcomes are caused by incompetent or worse, corrupt decision makers. Fixes include more communications, training, procedures, supervision, exhortations to try harder and criminal charges.

Dekker credits Newton et al for giving man the wherewithal to probe Nature's secrets and build amazing machines. However, Newtonian-Cartesian vision is not the only way to view the world, especially the world of complex, socio-technical systems. For that a new model, with different concepts and operating principles, is required.

The Complex System

Characteristics

The sheer number of parts does not make a system complex, only complicated. A truly complex system is open (it interacts with its environment), has components that act locally and don't know the full effects of their actions, is constantly making decisions to maintain performance and adapt to changing circumstances, and has non-linear interactions (small events can cause large results) because of multipliers and feedback loops. Complexity is a result of the ever-changing relationships between components. (pp.138-144)

Adding to the myriad information confronting a manager or observer, system performance is often optimized at the edge of chaos, where competitors are perpetually vying for relative advantage at an affordable cost.** The system is constantly balancing its efforts between exploration (which will definitely incur costs but may lead to new advantages) and exploitation (which reaps benefits of current advantages but will likely dissipate over time). (pp. 164-165)

The most important feature of a complex system is that it adapts to its environment over time in order to survive. And its environment is characterized by resource scarcity and competition. There is continuous pressure to maintain production and increase efficiency (and their visible artifacts: output, costs, profits, market share, etc) and less visible outputs, e.g., safety, will receive less attention. After all, “Though safety is a (stated) priority, operational systems do not exist to be safe. They exist to provide a service or product . . . .” (p. 99) And the cumulative effect of multiple adaptive decisions can be an erosion of safety margins and a changed response of the entire system. Such responses may be beneficial or harmful—a drift into failure.

Drift by a complex system exhibits several characteristics. First, as mentioned above, it is driven by environmental factors. Second, drift occurs in small steps so changes can be hardly noticed, and even applauded if they result in local performance improvement; “. . . successful outcomes keep giving the impression that risk is under control” (p. 106) as a series of small decisions whittle away at safety margins. Third, these complex systems contain unruly technology (think deepwater drilling) where uncertainties exist about how the technology may be ultimately deployed and how it may fail. Fourth, there is significant interaction with a key environmental player, the regulator, and regulatory capture can occur, resulting in toothless oversight.

“Drifting into failure is not so much about breakdowns or malfunctioning of components, as it is about an organization not adapting effectively to cope with the complexity of its own structure and environment.” (p. 121) Drift and occasionally accidents occur because of ordinary system functioning, normal people going about their regular activities making ordinary decisions “against a background of uncertain technology and imperfect information.” Accidents, like safety, can be viewed as an emergent system property, i.e., they are the result of system relationships but cannot be predicted by examining any particular system component.

Managers' roles

Managers should not try to transform complex organizations into merely complicated ones, even if it's possible. Complexity is necessary for long-term survival as it maximizes organizational adaptability. The question is how to manage in a complex system. One key is increasing the diversity of personnel in the organization. More diversity means less group think and more creativity and greater capacity for adaptation. In practice, this means validation of minority opinions and encouragement of dissent, reflecting on the small decisions as they are made, stopping to ponder why some technical feature or process is not working exactly as expected and creating slack to reduce the chances of small events snowballing into large failures. With proper guidance, organizations can drift their way to success.

Accountability

Amoral and criminal behavior certainly exist in large organizations but bad outcomes can also result from normal system functioning. That's why the search for culprits (bad actors or broken parts) may not always be appropriate or adequate. This is a point Dekker has explored before, in Just Culture (briefly reviewed here) where he suggests using accountability as a means to understand the system-based contributors to failure and resolve those contributors in a manner that will avoid recurrence.

Application to Nuclear Safety Culture

A commercial nuclear power plant or fleet is probably not a complete complex system. It interacts with environmental factors but in limited ways; it's certainly not directly exposed to the Wild West competition of say, the cell phone industry. Group think and normalization of deviance*** is a constant threat. The technology is reasonably well-understood but changes, e.g., uprates based on more software-intensive instrumentation and control, may be invisibly sanding away safety margin. Both the industry and the regulator would deny regulatory capture has occurred but an outside observer may think the relationship is a little too cozy. Overall, the fit is sufficiently good that students of safety culture should pay close attention to Dekker's observations.

In contrast, the Hanford Waste Treatment Plant (Vit Plant) is almost certainly a complex system and this book should be required reading for all managers in that program.

Conclusion

Drift Into Failure is not a quick read. Dekker spends a lot of time developing his theory, then circling back to further explain it or emphasize individual pieces. He reviews incidents (airplane crashes, a medical error resulting in patient death, software problems, public water supply contamination) and descriptions of organization evolution (NASA, international drug smuggling, “conflict minerals” in Africa, drilling for oil, terrorist tactics, Enron) to illustrate how his approach results in broader and arguably more meaningful insights than the reports of official investigations. Standing on the shoulders of others, especially Diane Vaughan, Dekker gives us a rich model for what might be called the “banality of normalization of deviance.” 


* S. Dekker, Drift Into Failure: From Hunting Broken Components to Understanding Complex Systems (Burlington VT: Ashgate 2011).

** See our Sept. 4, 2012 post onCynefin for another description of how the decisions an organization faces can suddenly slip from the Simple space to the Chaotic space.

*** We have posted many times about normalization of deviance, the corrosive organizational process by which the yesterday's “unacceptable” becomes today's “good enough.”

Tuesday, July 31, 2012

Regulatory Influence on Safety Culture

In September, 2011 the Nuclear Energy Agency (NEA) and the International Atomic Energy (IAEA) held a workshop for regulators and industry on oversight of licensee management.  “The principal aim of the workshop was to share experience and learning about the methods and approaches used by regulators to maintain oversight of, and influence, nuclear licensee leadership and management for safety, including safety culture.”*

Representatives from several countries made presentations.  For example, the U.S. presentation by NRC’s Valerie Barnes and INPO’s Ken Koves discussed work to define safety culture (SC) traits and correlate them to INPO principles and ROP findings (we previously reviewed this effort here).  Most other presentations also covered familiar territory. 

However, we were very impressed by Prof. Richard Taylor’s keynote address.  He is from the University of Bristol and has studied organizational and cultural factors in disasters and near-misses in both nuclear and non-nuclear contexts.  His list of common contributors includes issues with leadership, attitudes, environmental factors, competence, risk assessment, oversight, organizational learning and regulation.  He expounded on each factor with examples and additional detail. 

We found his conclusion most encouraging:  “Given the common precursors, we need to deepen our understanding of the complexity and interconnectedness of the socio-political systems at the root of organisational accidents.”  He suggests using system dynamics modeling to study archetypes including “maintaining visible convincing leadership commitment in the presence of commercial pressures.”  This is totally congruent with the approach we have been advocating for examining the effects of competing business and safety pressures on management. 

Unfortunately, this was the intellectual high point of the proceedings.  Topics that we believe are important to assessing and understanding SC got short shrift thereafter.  In particular, goal conflict, CAP and management compensation were not mentioned by any of the other presenters.

Decision-making was mentioned by a few presenters but there was no substantive discussion of this topic (the U.K. presenter had a motherhood statement that “Decisions at all levels that affect safety should be rational, objective, transparent and prudent”; the Barnes/Kove presentation appeared to focus on operational decision making).  A bright spot was in the meeting summary where better insight into licensees’ decision making process was mentioned as desirable and necessary by regulators.  And one suggestion for future research was “decision making in the face of competing goals.”  Perhaps there is hope after all.

(If this post seems familiar, last Dec 5 we reported on a Feb 2011 IAEA conference for regulators and industry that covered some of the same ground.  Seven months later the bureaucrats had inched the football a bit down the field.)


*  Proceedings of an NEA/IAEA Workshop, Chester, U.K. 26-28 Sept 2011, “Oversight and Influencing of Licensee Leadership and Management for Safety, Including Safety Culture – Regulatory Approaches and Methods,” NEA/CSNI/R(2012)13 (June 2012).

Friday, July 27, 2012

Modeling Safety Culture (Part 4): Simulation Results 2


As we introduced in our prior post on this subject (Results 1), we are presenting some safety culture simulation results based on a highly simplified model.  In that post we illustrated how management might react to business pressure caused by a reduction in authorized budget dollars.  The actions of management result in shifting of resources from safety to business and lead to changes in the state of safety culture.

In this post we continue with the same model and some other interesting scenarios.  In each of the following charts three outputs are plotted: safety culture in red, management action level in blue and business pressure in dark green.  The situation is an organization with a somewhat lower initial safety culture and confronted with a somewhat smaller budget reduction than the example in Results 1. 

Figure 1
Figure 1 shows an overly reactive management. The blue line shows management’s actions in response to the changes in business pressure (green) associated with the budget change.  Note that management’s actions are reactive, shifting priorities immediately and directly in response. The behavior leads to a cyclic outcome where management actions temporarily alleviate business pressure, but when actions are relaxed, pressure rises again, followed by another cycle of management response.  This could be a situation where management is not addressing the source of the problem, shifting priorities back and forth between business and safety.  Also of interest is that the magnitude of the cycle is actually increasing with time indicating that the system is essentially unstable and unsustainable.  Safety culture (red) declines throughout the time frame.

Figure 2
Figure 2 shows the identical conditions but where management implements a more restrained approach, delaying its response to changes in business.  The overall system response is still cyclic, but now the magnitude of the cycles is decreasing and converging on a stable outcome.






Figure 3
Figure 3 is for the same conditions, but the management response is restrained further.  Management takes more time to assess the situation and respond to business pressure conditions.  This approach starts to filter out the cyclic type of response seen in the first two examples and will eventually result in a lower business gap.

Perhaps the most important takeaway from these three simulations is that the total changes in safety culture are not significantly different.  A certain price is being paid for shifting priorities away from safety, however the ability to reduce and maintain lower business pressure is much better with the last management strategy.

Figure 4
The last example in this set is shown in Figure 4.  This is a situation where business pressure is gradually ramped up due to a series of small step reductions in budget levels.  Within the simulation we have also set a limit on extent of management actions.  Initially management takes no action to shift priorities - business pressure is within a value that safety culture can resist.  Consequently safety culture remains stable.  After the third “bump” in business pressure, the threshold resistance of safety culture is broken and management starts to modestly shift priorities.  Even though business pressure continues to ramp up, management response is capped and does not “chase” closing the business gap.  As a result safety culture suffers only a modest reduction before stabilizing.  This scenario may be more typical of an organization with a fairly strong safety culture - under sufficient pressure it will make modest tradeoffs in priorities but will resist a significant compromise in safety.

Sunday, July 15, 2012

Modeling Safety Culture (Part 3): Simulation Results 1

As promised in our June 29, 2012 post, we are taking the next step to incorporate our mental models of safety culture and decision making in a simple simulation program.  The performance dynamic we described viewed safety culture as a “level”, and the level of safety culture determines its ability to resist pressure associated with competing business priorities. If business performance is not meeting goals, pressure on management is created which can be offset by sufficiently strong safety culture. However if business pressure exceeds the threshold for a given safety culture level, management decision making can be affected, resulting in a shift of resources from safety to business needs. This may relieve some business pressure but create a safety gap that can degrade safety culture, making it potentially even more vulnerable to business pressure.

It is worth expanding on the concept of safety culture as a “level” or in systems dynamics terms, a “stock” - an analogy might be the level of liquid in a reservoir which may increase or decrease due to flows into and out of the reservoir.  This representation causes safety culture to respond less quickly to changes in system conditions than other factors.  For example, an abrupt cut in an organization’s budget and its pressure on management to respond may occur quite rapidly - however its impact on organizational safety culture will play out more gradually.  Thus “...stocks accumulate change.  They are kind of a memory, storing the results of past actions...stocks cannot be adjusted instantaneously no matter how great the organizational pressures…This vital inertial characteristic of stock and flow networks distinguishes them from simple causal links.”* 

Let’s see this in action in the following highly simplified model.  The model considers just two competing priorities: safety and business.  When performance in these categories differs from goals, pressure is created on management and may result in actions to ameliorate the pressure.  In this model management action is limited to shifting resources from one priority to the other.  Safety culture, per our June 29, 2012 post, is an organization’s ability to resist and then respond to competing priorities.  At time zero, a reduction in authorized budget is imposed resulting in a gap (current spending versus authorized spending) and creating business pressure on management to respond.

Figure 1
Figure 1 shows the response of management.  Actions are initiated very quickly and start to reduce safety resources to relieve budget pressure.  The plot tracks the initial response, a plateauing to allow effectiveness to be gauged, followed by escalation of action to further reduce the budget gap.




Figure 2
Figure 2 overlays the effect of the management actions on the budget gap and the business
pressure associated with the gap.  Immediately following the budget reduction, business pressure rapidly increases and quickly reaches a level sufficient to cause management to start to shift priorities.  The first set of management actions brings some pressure relief, the second set of actions further reduces pressure.  As expected there is some time lag in the response of business pressure to the actions of management.

Figure 3
In Figure 3, the impact of these changes in business pressure and management actions are
accumulated in the safety culture.  Note first the gradual changes that occur in culture versus the faster and sharper changes in management actions and business pressure.  As management takes action there is a loss of safety priority and safety culture slowly degrades. When further escalation of management action occurs it is at a point where culture is already lower, making the organization more susceptible to compromising safety priorities.  Safety culture declines further. This type of response is indicative of a feedback loop which is an important dynamic feature of the system.  Business pressure causes management actions, those actions degrade safety culture, degraded culture reduces resistance to further actions.

We invite comments and questions from our readers.


*  John Morecroft, Strategic Modelling and Business Dynamics (John Wiley & Sons, 2007) pp. 59-61.

Monday, April 16, 2012

The Many Causes of Safety Culture Performance

The promulgation of the NRC’s safety culture policy statement and industry efforts to remain out in front of regulatory scrutiny have led to increasing attention to identifying safety culture issues and achieving a consistently strong safety culture.

The typical scenario for the identification of safety culture problems starts with performance deficiencies of one sort or another, identified by the NRC through the inspection process or internally through various quality processes.  When the circumstances of the deficiencies suggest that safety culture traits, values or behaviors are involved, safety culture may be deemed in need of strengthening and a standard prescription is triggered.  This usually includes the inevitable safety culture assessment, retraining, re-iteration of safety priorities, re-training in safety culture principles, etc.  The safety culture surveys focus on perceptions of problems and organizational “hot spots” but rarely delve deeply into underlying causes.  Safety culture surveys generate anecdotal data based on the perceptions of individuals, primarily focused on whether safety culture traits are well established but generally not focused on asking “why” there are deficiencies.

This approach to safety culture seems to us to suffer from several limitations.  One is that the standard prescription does not necessarily yield improved, sustainable results, an indication that symptoms are being treated instead of causes.  And therein is the source of the other limitation, a lack of explicit consideration of the possible causes that have led to safety culture being deficient.  The standard prescribed fixes include an implicit presumption that safety culture issues are the result of inadequate training, insufficient reinforcement of safety culture values, and sometimes the catchall of “leadership” shortcomings. 

We think there are a number of potential causes that are important to ensuring strong safety culture but are not receiving the explicit attention they deserve.  Whatever the true causes we believe that there will be multiple causes acting in a systematic manner - i.e., causes that interact and feedback in complex combinations to either reinforce or erode the safety culture state.  For now we want to use this post to highlight the need to think more about the reasons for safety culture problems and whether a “causal chain” exists.  Nuclear safety relies heavily on the concept of root causes as a means to understand the origin of problems and a belief that “fix-the-root cause” will “fix-the-problem”.  But a linear approach may not be effective in understanding or addressing complex organizational dynamics, and concerted efforts in one dimension may lead to emergent issues elsewhere.

In upcoming posts we’ll explore specific causes of safety culture performance and elicit readers’ input on their views and experience.

Tuesday, May 10, 2011

Shifting the Burden

Pitot tube
This post emanates from the ongoing investigations of the crash of Air France flight 447 from Rio de Janeiro to Paris.  In some respects it is a follow-up to our January 27, 2011 post on Air France’s safety culture.  An article in the New York Times Sunday Magazine* explores some of the mysteries surrounding the loss of the plane in mid-Atlantic.  One of the possible theories for the crash involves the pitot tubes used on the Airbus plane.  Pitot tubes are instruments used on aircraft to measure air speed.  The pitot tube measures the difference between total (stagnation) and static pressure to determine dynamic pressure and therefore velocity of the air stream.  Care must be taken to assure that the pitot tubes do not become clogged with ice or other foreign matter as it would interrupt or corrupt the airspeed signal provided to the pilots and the auto-pilot system. 

On the flight 447 aircraft, three Thales AA model pitot tubes were in use.  They are produced by a French company and cost approximately $3500 each.  The Times article goes on to explain:

"...by the summer of 2009, the problem of icing on the Thales AA was known to be especially common….Between 2003 and 2008, there were at least 17 cases in which the Thales AA had problems on the Airbus A330 and its sister plane, the A340.  In September 2007, Airbus issued a ‘service bulletin’ suggesting that airlines replace the AA pitots with a newer model, the BA, which was said to work better in ice.”

Air France’s response to the service bulletin established a policy to replace the AA tubes “only when a failure occurred”.  A year later Air France then asked Airbus for “proof” that the model BA tubes worked better in ice.  It took Airbus another 6-7 months to perform tests that demonstrated the superior performance of the BA tubes, following which Air France proceeded with implementing the recommended change for its A330 aircraft.  Unfortunately the new probes had not yet been installed at the time of flight 447.

Much is still unknown about whether in fact the pitot tubes played a role in the crash of flight 447 and of the details of Air France’s consideration of deploying replacements.  But there is a sufficient framework to pose some interesting questions regarding how safety considerations were balanced in the process, and what might be inferred about the Air France safety culture.  Most clearly it highlights how fundamental the decision making process is to safety culture.

What is clear is that Air France’s approach to this problem “shifted the burden” from assuring that something was safe to proving that it was unsafe.  In legal usage this involves transferring the obligation to prove a fact in controversy from one party to another.  Or in systems thinking (which you may have noticed we strongly espouse) it denotes a classic dynamic archetype - a problem arises, it can be ameliorated through either a short term, symptom based response or a fundamental solution that may take additional time and/or resources to implement.  Choosing the short term fix provides relief and reinforces the belief in the efficacy of the response.  Meanwhile the underlying problem goes unaddressed.  For Air France, the service bulletin created a problem.  Air France could have immediately replaced the pitot tubes or undertaken its own assessment of pitot tubes with replacement to follow.  This would have taken time and resources.  Nor did Air France appear to try to address the threshold question of whether the existing AA model instruments were adequate - in nuclear industry terms, were they “operable” and able to perform their safety function?  Air France apparently did not even implement interim measures such as retraining to improve pilot’s recognition and response to pitot tube failures or incorrect readings.  Instead, Air France shifted the burden back to Airbus to “prove” their recommendation.  The difference between showing that something is not safe versus that it is safe is as wide as, well, the Atlantic Ocean.

What we find particularly interesting about shifting the burden is that it is just another side of the complacency coin.  Most people engaged in safety culture science recognize that complacency is a potential contributor to the decay and loss of effectiveness of safety culture.  Everything appears to be going OK so there is less need to pursue issues, particularly those lacking safety impact clarity.  Not pursuing root causes, not verifying corrective action efficacy, loss of questioning attitude and lack of resources could all be telltale signs of complacency.  The interesting thing about shifting the burden is that it yields much the same result - but with the appearance that action is being taken. 

The footnote to the story is the response of Air Caraibes to similar circumstances in this time frame.  The Times article indicates Air Caraibes experienced two “near misses” with Thales AA pitot tubes on A330 aircraft.  They immediately replaced the parts and notified regulators.


*  W.S. Hylton, "What Happened to Air France Flight 447?" New York Times Magazine (May 4, 2011).

Friday, June 18, 2010

Assessing Safety Culture

In our June 15th post, we reported on Wahlström and Rollenhagen’s* concern that trying to measure safety culture could do more harm than good. However, the authors go on to assert that safety culture can and should be assessed. They identify different methods that can be used to perform such assessments, including peer reviews and self assessments. They conclude “Ideally safety culture assessments should be carried out as an interaction between an assessment team and a host organization and it should be aimed at the creation of an awareness of potential safety threats . . . .” (§ 7) We certainly agree with that observation.

We are particularly interested in their comments on safety (performance) indicators, another tool for assessing safety culture. We agree that “. . . most indicators are lagging in the sense that they summarize past safety performance” (§ 6.2) and thus may not be indicative of future performance. In an effort to improve performance indicators, the authors suggest “One approach towards leading safety indicators may be to start with a set of necessary conditions from which one can obtain a reasonable model of how safety is constructed. The necessary conditions would then suggest a set of variables that may be assessed as precursors for safety. An assessment could then be obtained using an ordinal scale and several variables could be combined to set an alarm level.” (ibid.)

We believe the performance indicator problem should be approached somewhat differently. Safety culture, safety management and safety performance do not exist in a vacuum. We advocate using the principles of system dynamics to construct an organizational performance model that shows safety as both input to and output from other, sometimes competing organizational goals, resource constraints and management actions. This is a more robust approach because it can not only show that safety culture is getting stronger or slipping, but why, i.e., what other organizational factors are causing safety culture change to occur. If the culture is slipping, then analysis of system information can suggest where the most cost-effective interventions can be made. For more information on using system dynamics to model safety culture, please visit our companion website, nuclearsafetysim.com.

* Björn Wahlström, Carl Rollenhagen. Assessments of safety culture – to measure or not? Paper presented at the 14th European Congress of Work and Organizational Psychology, May 13-16, 2009, Santiago de Compostela, Spain. The authors are also connected with the LearnSafe project, which we have discussed in earlier posts (click the LearnSafe label to see them.)

Wednesday, April 28, 2010

Safety Culture: Cause or Context (part 2)

In an earlier post, we discussed how “mental models” of safety culture affect perceptions about how safety culture interacts with other organizational factors and what interventions can be taken if safety culture issues arise. We also described two mental models, the Causal Attitude and Engineered Organization. This post describes a different mental model, one that puts greater emphasis on safety culture as a context for organizational action.

Safety Culture as Emergent and Indeterminate

If the High Reliability Organization model is basically optimistic, the Emergent and Indeterminate model is more skeptical, even pessimistic as some authors believe that accidents are unavoidable in complex, closely linked systems. In this view, “the consequences of safety culture cannot be engineered and only probabilistically predicted.” Further, “safety is understood as an elusive, inspirational asymptote, and more often only one of a number of competing organizational objectives.” (p. 356)* Safety culture is not a cause of action, but provides the context in which action occurs. Efforts to exhaustively model (and thus eventually manage) the organization are doomed to failure because the organization is constantly adapting and evolving.

This model sees that the same processes that produce the ordinary and routine stuff of everyday organizational life also produce the messages of impending problems. But the organization’s necessary cognitive processes tend to normalize and homogenize; the organization can’t very well be expected to treat every input as novel or not previously experienced. In addition, distributed work processes and official security policies can limit the information available to individuals. Troublesome information may be buried or discredited. And finally, “Dangers that are neither spectacular, sudden, nor disastrous, or that do not resonate with symbolic fears, can remain ignored and unattended, . . . . “ (p. 357)

We don’t believe safety significant events are inevitable in nuclear organizations but we do believe that the hubris of organizational designers can lead to specific problems, viz., the tendency to ignore data that does not comport with established categories. In our work, we promote a systems approach, based on system dynamics and probabilistic thinking, but we recognize that any mental or physical model of an actual, evolving organization is just that, a model. And the problem with models is that their representation of reality, their “fit,” can change with time. With ongoing attention and effort, the fit may become better but that is a goal, not a guaranteed outcome.

Lessons Learned

What are the takeaways from this review? First, mental models are important. They provide a framework for understanding the world and its information flows, a framework that the holder may believe to be objective but is actually quite subjective and creates biases that can cause the holder to ignore information that doesn’t fit into the model.

Second, the people who are involved in the safety culture discussion do not share a common mental model of safety culture. They form their models with different assumptions, e.g., some think safety culture is a force that can and does affect the vector of organizational behavior, while others believe it is a context that influences, but does not determine, organizational and individual decisions.

Third, safety culture cannot be extracted from its immediate circumstances and examined in isolation. Safety culture always exists in some larger situation, a world of competing goals and significant uncertainty with respect to key factors that determine the organization’s future.

Fourth, there is a risk of over-reliance on surveys to provide some kind of "truth" about an organization’s safety culture, especially if actual experience is judged or minimized to fit the survey results. Since there is already debate about what surveys measure (safety culture or safety climate?), we advise caution.

Finally, in addition to appropriate models and analyses, training, supervision and management, the individual who senses that something is just not right and is supported by an organization that allows, rather than vilifies, alternative interpretations of data is a vital component of the safety system.


* This post draws on Susan S. Silbey, "Taming Prometheus: Talk of Safety and Culture," Annual Review of Sociology, Volume 35, September 2009, pp. 341-369.

Sunday, April 18, 2010

Safety Culture: Cause or Context (part 1)

As we have mentioned before, we are perplexed that people are still spending time working on safety culture definitions. After all, it’s not because of some definitional issue that problems associated with safety culture arise at nuclear plants. Perhaps one contributing factor to the ongoing discussion is that people hold different views of what the essence of safety culture is, views that are influenced by individuals’ backgrounds, experiences and expectations. Consultants, lawyers, engineers, managers, workers and social scientists can and do have different perceptions of safety culture. Using a term from system dynamics, they have different “mental models.”

Examining these mental models is not an empty semantic exercise; one’s mental model of safety culture determines (a) the degree to which one believes it is measurable, manageable or independent, i.e. separate from other organizational features, (b) whether safety culture is causally related to actions or simply a context for actions, and (c) most importantly, what specific strategies for improving safety performance might work.

To help identify different mental models, we will refer to a 2009 academic article by Susan Silbey,* a sociology professor at MIT. Her article does a good job of reviewing the voluminous safety culture literature and assigning authors and concepts into three main categories: Culture as (a) Causal Attitude, (b) Engineered Organization, and (c) Emergent and Indeterminate. To fit into our blog format, we will greatly summarize her paper, focusing on points that illustrate our notion of different mental models, and publish this analysis in two parts.

Safety Culture as Causal Attitude

In this model, safety culture is a general concept that refers to an organization’s collective values, beliefs, assumptions, and norms, often assessed using survey instruments. Explanations of accidents and incidents that focus on or blame an organization’s safety culture are really saying that the then-existing safety culture somehow caused the negative events to occur or can be linked to the events by some causal chain. (For an example of this approach, refer to the Baker Report on the 2005 BP Texas City refinery accident.)

Adopting this mental model, it follows logically that the corrective action should be to fix the safety culture. We’ve all seen, or been a part of, this – a new management team, more training, different procedures, meetings, closer supervision – all intended to fix something that cannot be seen but is explicitly or implicitly believed to be changeable and to some extent measurable.

This approach can and does work in the short run. Problems can arise in the longer-term as non-safety performance goals demand attention; apparent success in the safety area breeds complacency; or repetitive, monotonous reinforcement becomes less effective, leading to safety culture decay. See our post of March 22, 2010 for a discussion of the decay phenomenon.

Perhaps because this model reinforces the notion that safety culture is an independent organizational characteristic, the model encourages involved parties (plant owners, regulators, the public) to view safety culture with a relatively narrow field of view. Periodic surveys and regulatory observations conclude a plant’s safety culture is satisfactory and everyone who counts accepts that conclusion. But then an event occurs like the recent situation at Vermont Yankee and suddenly people (or at least we) are asking: How can eleven employees at a plant with a good safety culture (as indicated by survey) produce or endorse a report that can mislead reviewers on a topic that can affect public health and safety?

Safety Culture as Engineered Organization

The model is evidenced in the work of the High Reliability Organization (HRO) writers. Their general concept of safety culture appears similar to the Causal Attitude camp but HRO differs in “its explicit articulation of the organizational configuration and practices that should make organizations more reliably safe.” (Silbey, p. 353) It focuses on an organization’s learning culture where “organizational learning takes place through trial and error, supplemented by anticipatory simulations.” Believers are basically optimistic that effective organizational prescriptions for achieving safety goals can be identified, specified and implemented.

This model appears to work best in a command and control organization, i.e., the military. Why? Primarily because a specific military service is characterized by a homogeneous organizational culture, i.e., norms are shared both hierarchically (up and down) and across the service. Frequent personnel transfers at all organizational levels remove people from one situation and reinsert them into another, similar situation. Many of the physical settings are similar – one ship of a certain type and class looks pretty much like another; military bases have a common set of facilities.

In contrast, commercial nuclear plants represent a somewhat different population. Many staff members work more or less permanently at a specific plant and the industry could not have come up with more unique physical plant configurations if it had tried. Perhaps it is not surprising that HRO research, including reviews of nuclear plants, has shown strong cultural homogeneity within individual organizations but lack of a shared culture across organizations.

At its best, the model can instill “processes of collective mindfulness” or “interpretive work directed at weak signals.” At its worst, if everyone sees things alike, an organization can “[drift] toward[s] inertia without consideration that things could be different.” (Weick 1999, quoted in Silbey, p.354) Because HRO is highly dependent on cultural homogeneity, it may be less conscious of growing problems if the organization starts to slowly go off the rails, a la the space shuttle Challenger.

We have seen efforts to implement this model at individual nuclear plants, usually by trying to get everything done “the Navy way.” We have even promoted this view when we talked back in the late 1990s about the benefits of industry consolidation and the best practices that were being implemented by Advanced Nuclear Enterprises (a term Bob coined in 1996). Today, we can see that this model provides a temporary, partial answer but can face challenges in the longer run if it does not constantly adjust to the dynamic nature of safety culture.

Stay tuned for Safety Culture: Cause or Context (part 2).

* Susan S. Silbey, "Taming Prometheus: Talk of Safety and Culture," Annual Review of Sociology, Volume 35, September 2009, pp. 341-369.

Monday, March 22, 2010

Safety Culture Dynamics (part 1)

Over the last several years there have been a number of nuclear organizations that have encountered safety culture and climate issues at their plants. Often new leadership is brought to the plant in hopes of stimulating the needed changes in culture. Almost always there is increased training and reiteration of safety values and a safety culture survey to gain a sense of the organizational temperature. It is a little difficult to gauge precisely how effective these measures are - surveys are snapshots in time and direct indicators of safety culture are lacking. In some cases, safety culture appears to respond in the short term to these changes but then loses momentum and backslides further out in time.

How does one explain these types of evolutions in culture? Conventional wisdom has been that culture is leadership driven and when safety culture is deficient, new management can “turn around” the situation. We have argued that the dynamics of safety culture are more complex and are subject to a confluence of factors that compete for the priorities and decisions of the organization. We use simulation models of safety culture to suggest how these various factors can interact and respond to various initiatives. We made an attempt at a simple illustration of what may illustrate the situation at a plant which responds as described above. CLICK ON THIS LINK to see the simulated safety culture dynamic response.

The simulation shows changes in some key variables over time. In this case the time period is 5 years. For approximately the first year the simulation illustrates the status quo prior to the change in leadership. Safety culture was in gradual decline despite nominal attention to actions to reinforce a safety mindset in the organization.

At approximately the one year mark, leadership is changed and actions are taken to significantly increase the safety priority of the organization. This is reflected in a spike in reinforcement that typically includes training, communications and strong management emphasis on the elements of safety culture. Note that following a lag, safety culture starts to improve in response to these changes. As time progresses, the reinforcement curve peaks and starts to decay due to something we refer to as “saturation”. Essentially the new leadership’s message is starting to have less and less impact even though it is being constantly reiterated. For a time safety culture continues to improve but then turns around due to the decreasing effectiveness of reinforcement. Eventually safety culture regresses to a level where many of the same problems start to recur.

Is this a diagnosis of what is happening at any particular site? No, it is merely suggestive of some of the dynamics that are work in safety culture. In this particular simulation other actions that may be needed to build strong, enduring safety culture were not implemented in order to isolate the failure of one-dimensional actions to provide long term solutions. One of the indicators of this narrow approach can be seen in the line on the simulation representing the trust level within the organization. It hardly changes or responds to the other dynamics. Why? In our view trust tends to be driven by the overall, big picture of forces at work and the extent to which they consistently demonstrate safety priority. Reinforcement (in our model) reflects primarily a training and messaging action by management. Other more potent forces include whether management “walks the talk”, whether resources are allocated consistent with safety priorities, whether short term needs are allowed to dominate longer term priorities, whether problems are identified and corrected in a manner to prevent recurrence, etc. In this particular simulation example, these other signals are not entirely consistent with the reinforcement messages, with a net result that trust hardly changes.

More information regarding safety culture simulation is available at the nuclearsafetysim.com website. Under the Models tab, Model 3 provides a short tutorial on the concept of saturation and its effect on safety culture reinforcement.

Thursday, August 13, 2009

Primer on System Dynamics

System Dynamics is a concept for seeing the world in terms of inputs and outputs, where internal feedback loops and time delays can affect system behavior and lead to complex, non-linear changes in system performance.

The System Dynamics worldview was originally developed by Prof. Jay Forrester at MIT. Later work by other thinkers, e.g., Peter Senge, author of The Fifth Discipline, expanded the original concepts and made them available to a broader audience. An overview of System Dynamics can be found on Wikipedia.

Our NuclearSafetySim program uses System Dynamics to model managerial behavior in an environment where maintaining the nuclear safety culture is a critical element. NuclearSafetySim is built using isee Systems iThink software. isee Systems has educational materials available on their website that explain some basic concepts.

There are other vendors in the System Dynamics software space, including Ventana Systems and their Vensim program. They also provide some reference materials, available here.

Wednesday, July 29, 2009

Single Loop, Double Loop – What Is This All About? (MIT #3)

One of the potential benefits of academic papers is the opportunity for theoretical structure to be put forward to explain a set of observational experience.  The MIT paper [pg 4] provides such a theory regarding organizational learning and safety culture.  They cite the difference between “single loop” and “double loop” learning as vital to the way organizations respond to performance problems.  Single loop learning “represents the immediate and local actions that individuals and organizations take in response to a perceived problem.”  On the other hand, double loop learning “instead of focusing on enforcement…question[s] why rules were not originally followed…”. 

The MIT authors contend that double loop offers the greatest potential benefit to safety, but can be a difficult challenge since “it threatens existing bureaucratic structures”.  And they add an insight that derives from their (and our) view of safety as a dynamic process: “the immediate success of single loop learning can undermine both the motivation and the perceived need to follow through on more substantial improvement efforts…”

How does the theory of single and double loop resonate with your experience?  Do you see single loop being the dominant response within your organization?