Showing posts with label Goal Conflict. Show all posts
Showing posts with label Goal Conflict. Show all posts

Saturday, May 3, 2014

DOE Report on WIPP's Safety Culture

On Feb. 14, 2014, an incident at the Department of Energy (DOE) Waste Isolation Pilot Plant (WIPP) resulted in the release of radioactive americium and plutonium into the environment.  This post reviews DOE’s Phase 1 incident report*, with an emphasis on safety culture (SC) concerns.

From the Executive Summary

The Accident Investigation Board (the Board) concluded that a more thorough hazard analysis, coupled with a better filter system could have prevented the unfiltered above ground release. (p. ES-1)

The root cause of the incident was Nuclear Waste Partnership’s (NWP**, the site contractor) and the DOE Carlsbad Field Office’s (CBFO) failure to manage the radiological hazard. “The cumulative effect of inadequacies in ventilation system design and operability compounded by degradation of key safety management programs and safety culture [emphasis added] resulted in the release of radioactive material . . . and the delayed/ineffective recognition and response to the release.” (pp. ES 6-7)

The report presents eight contributing causes, most of which point to NWP deficiencies.  SC was included as a site-wide concern, specifically the SC does not fully implement DOE safety management policy, “[t]here is a lack of a questioning attitude, reluctance to bring up and document issues, and an acceptance and normalization of degraded equipment and conditions.”  A recent Safety Conscious Work Environment (SCWE) survey suggests a chilled work environment. (p. ES-8)

The report includes 31 conclusions, 4 related to SC.  “NWP and CBFO have allowed the safety culture at the WIPP project to deteriorate . . . Questioning attitudes are not welcomed by management . . . DOE has exacerbated the safety culture problem by referring to numbers of [problem] reports . . . as a measure of [contractor] performance . . . . [NWP and CBFO] failed to identify weaknesses in . . . safety culture.” (pp. ES 14-15, 19-20)

The report includes 47 recommendations (called Judgments of Need) with 4 related to SC.  They cover leadership (including the CBFO site manager) behavior, organizational learning, questioning attitude, more extensive use of existing processes to raise issues, engaging outside SC expertise and improving contractor SC-related processes. (ibid.)

Report Details

The body of the report presents the details behind the conclusions and recommendations.  Following are some of the more interesting SC items, starting with our hot button issues: decision making (esp. the handling of goal conflict), corrective action, compensation and backlogs. 

Decision Making

The introduction to section 5 on SC includes an interesting statement:  “In normal human behavior, production behaviors naturally take precedence over prevention behaviors unless there is a strong safety culture - nurtured by strong leadership.” (p. 61)

The report suggests nature has taken its course: WIPP values production first and most.  “Eighteen emergency management drills and exercises were cancelled in 2013 due to an impact on operations. . . .Management assessments conducted by the contractor have a primary focus on cost and schedule performance.” (p. 62)  “The functional checks on CAMs [continuous air monitors] were often delayed to allow waste-handling activities to continue.” (p. 64)  “[D]ue consideration for prioritization of maintenance of equipment is not given unless there is an immediate impact on the waste emplacement processes.” (p. ES-17)  These observations evidence an imbalance between the goals of production and prevention (against accidents and incidents) and, following the logic of the introductory statement, a weak SC.

Corrective Action

The corrective action program has problems.  “The [Jan. 2013] SCWE Self-Assessment . . . identified weaknesses in teamwork and mutual respect . . . Other than completing the [SCWE] National Training Center course, . . . no other effective corrective actions have been implemented. . . . [The Self-Assessment also ]“identified weaknesses in effective resolution of reported problems.” (p. 63)  For problems that were reported, “The Board noted several instances of reported deficiencies that were either not issued, or for which corrective action plans were not developed or acted on for months.” (p. 65)

Compensation

Here is the complete text of Conclusion 14, which was excerpted above: “DOE has exacerbated the safety culture problem by referring to numbers of ORPS [incident and problem] reports and other deficiency reporting documents, rather than the significance of the events, as a measure of performance by Source Evaluation Boards during contract bid evaluations, and poor scoring on award fee determinations.  Directly tying performance to the number of occurrence reports drives the contractor to non-disclosure of events in order to avoid the poor score. [emphasis added]  This practice is contrary to the Department’s goals of the development and implementation of a strong safety culture across our projects.” (p. ES-15)  ‘Nuff said. 

Backlogs

Maintenance was deferred if it interfered with production.  Equipment and systems were  allowed to degrade (pp. ES-7, ES-17, C-7)  There is no indication that maintenance backlogs were a problem; the work simply wasn’t done.

Other SC Issues

In addition to our Big Four and the issues cited from the Executive Summary, the report mentions the following concerns.  (A listing of all SC deficiencies is presented on p. D-3.)

  • Delay in recognizing and responding to events,
  • Bias for negative conclusions on Unreviewed Safety Question Determinations, and
  • Infrequent presence of NWP management in the underground and surface.
Our Perspective

For starters, the Board appears to have a limited view of what SC is.  They see it as a cause for many of WIPP's problems but it can be fixed if it is “nurtured by strong leadership” and the report's recommendations are implemented.  The recommendations are familiar and can be summed up as “Row harder!”***  In reality, SC is both cause (it creates the context for decision making) and consequence (it is influenced by the observed actions of all organization members, not just senior management).  SC is an organizational property that cannot be managed directly.  

The report is a textbook example of linear, deterministic thinking, especially Appendix E (46 pgs.) on events and causal factors related to the incident.  The report is strong on what happened but weak on why things happened.  Going through Appendix E, SC is a top-level blanket cause of nuclear safety program and radiological event shortcomings (and, to a lesser degree, ventilation, CAMs and ground control problems) but there is no insight into how SC interacts with other organizational variables or with WIPP’s external (political, regulatory, DOE policy) environment. 

Here’s an example of what we’re talking about, viz., how one might gain some greater insight into a problem by casting a wider net and applying a bit of systems thinking.  The report faults DOE HQ for ineffective oversight, providing inadequate resources and not holding CBFO accountable for performance.  The recommended fix is for DOE HQ “to better define and execute their roles and responsibilities” for oversight and other functions. (p. ES-21)  That’s all what and no why.  Is there some basic flaw in the control loop involving DOE HQ, CBFO and NWP?  DOE HQ probably believes it transmits unambiguous orders and expectations through its official documents—why weren’t they being implemented in the field and why didn’t DOE know it?  Is the information flow from DOE to CBFO to NWP clear and adequate (policies, goals); how about the flow in the opposite direction (performance feedback, problems)?  Is something being lost in the translation from one entity to another?  Does this control problem exist between DOE HQ and other sites, i.e., is it a systemic problem?  Who knows.****

Are there other unexamined factors that make WIPP's problems more likely?  For example, has WIPP escaped the scrutiny and centralized controls that DOE applies to other entities?  As a consequence, has WIPP had too much autonomy to adjust its behavior to match its perception of the task environment?  Are DOE’s and WIPP’s mental models of the task environment similar or even adequate?  Perhaps WIPP (and possibly DOE) see the task environment as simpler than it actually is, and therefore the strategies for handling the environment lack requisite variety.  Was there an assumption that NWP would continue the apparently satisfactory performance of the previous contractor?  It's obvious these questions do not specifically address SC but they seek to ascertain how the organizations involved are actually functioning, and SC is an important variable in the overall system.

Contrast with Other DOE SC Investigations 


This report presents a sharp contrast to the foot-dragging that takes place elsewhere in DOE.  Why can’t DOE bring a similar sense of urgency to the SC investigations it is supposed to be conducting at its other facilities?  Was the WIPP incident that big a deal (because it involved a radioactive release) or is it merely something that DOE can wrap its head around?  (After all, WIPP is basically an underground warehouse.)  In any event, something rang DOE’s bell because they quickly assembled a 5 member board with 16 advisor/consultants and produced a 300 page report in less than two months.*****

Bottom line: You don't need to pore over this report but it provides some perspective on how DOE views SC and demonstrates that a giant agency can get moving if it's motivated to do so.


*  DOE Office of Environmental Management, “Accident Investigation Report: Radiological Release Event at the Waste Isolation Pilot Plant on February 14, 2014, Phase 1” (April 2014).  Retrieved April 30, 2014.  Our thanks to Mark Lyons who posted this report on the LinkedIn Nuclear Safety group discussion board.

**  NWP LLC was formed by URS Energy and Construction, Inc. and Babcock & Wilcox Technical Services Group, Inc.  Their major subcontractor is AREVA Federal Services, LLC.  All three firms perform work at other, i.e., non-WIPP, DOE facilities.  NWP assumed management of WIPP on Oct. 1, 2012.  From NWP website.  Retrieved May 2, 2014.

***  To the Board's credit, they did not go looking for individual scapegoats to blame for WIPP's difficulties.

****  In fairness, the report has at least one example of a feedback loop in the CBFO-NWP sub-system: CBFO's use of the condition reports as an input to NWP’s compensation review and NWP's predictable reaction of creating fewer condition reports.

*****  The Accident Investigation Board was appointed on Feb. 27, 2014 and completed its Phase 1 investigation on March 28, 2014.  The Phase 1 report was released to the public on April 22, 2014.

Thursday, January 9, 2014

Safety Culture Training Labs

Not a SC Training Lab
This post highlights a paper* Carlo Rusconi presented at the American Nuclear Society meeting last November.  He proposes the use of “training labs” to develop improved safety culture (SC) through the use of team-building exercises, e.g., role play, and table-top simulations.  Team building increases (a) participants' awareness of group dynamics, e.g., feedback loops, and how a group develops shared beliefs and (b) sensitivity to the viewpoints of others, viewpoints that may differ greatly based on individual experience and expectations.  The simulations pose evolving scenarios that participants must analyze and develop a team approach for addressing.  A key rationale for this type of training is “team interactions, if properly developed and trained, have the capacity to counter-balance individual errors.” (p. 2155)

Rusconi's recognition of goal conflict in organizations, the weakness of traditional methods (e.g., PRA) for anticipating human reactions to emergent issues, the need to recognize different perspectives on the same problem and the value of simulation in training are all familiar themes here at Safetymatters.

Our Perspective

Rusconi's work also reminds us how seldom new approaches for addressing SC concepts, issues, training and management appear in the nuclear industry.  Per Rusconi, “One of the most common causes of incidents and accidents in the industrial sector is the presence of hidden or clear conflicts in the organization. These conflicts can be horizontal, in departments or in working teams, or vertical, between managers and workers.” (p. 2156)  However, we see scant evidence of the willingness of the nuclear industry to acknowledge and address the influence of goal conflicts.

Rusconi focuses on training to help recognize and overcome conflicts.  This is good but one needs to be careful to clearly identify how training would do this and its limitations. For example, if promotion is impacted by raising safety issues or advocating conservative responses, is training going to be an effective remedy?  The truth is there are some conflicts which are implicit (but very real) and hard to mitigate. Such conflicts can arise from corporate goals, resource allocation policies and performance-based executive compensation schemes.  Some of these conflicts originate high in the organization and are not really amenable to training per se.

Both Rusconi's approach and our NuclearSafetySim tool attempt to stimulate discussion of conflicts and develop rules for resolving them.  Creating a measurable framework tied to the actual decisions made by the organization is critical to dealing with conflicts.  Part of this is creating measures for how well decisions embody SC, as done in NuclearSafetySim.

Perhaps this means the only real answer for high risk industries is to have agreement on standards for safety decisions.  This doesn't mean some highly regimented PRA-type approach.  It is more of a peer type process incorporating scales for safety significance, decision quality, etc.  This should be the focus of the site safety review committees and third-party review teams.  And the process should look at samples of all decisions not just those that result in a problem and wind up in the corrective action program (CAP).

Nuclear managers would probably be very reluctant to embrace this much transparency.  A benign view is they are simply too comfortable believing that the "right" people will do the "right" thing.  A less charitable view is their lack of interest in recognizing goal conflicts and other systemic issues is a way to effectively deny such issues exist.

Instead of interest in bigger-picture “Why?” questions we see continued introspective efforts to refine existing methods, e.g., cause analysis.  At its best, cause analysis and any resultant interventions can prevent the same problem from recurring.  At its worst, cause analysis looks for a bad component to redesign or a “bad apple” to blame, train, oversee and/or discipline.

We hate to start the new year wearing our cranky pants but Dr. Rusconi, ourselves and a cadre of other SC analysts are all advocating some of the same things.  Where is any industry support, dialogue, or interaction?  Are these ideas not robust?  Are there better alternatives?  It is difficult to understand the lack of engagement on big-picture questions by the industry and the regulator.


*  C. Rusconi, “Training labs: a way for improving Safety Culture,” Transactions of the American Nuclear Society, Vol. 109, Washington, D.C., Nov. 10–14, 2013, pp. 2155-57.  This paper reflects a continuation of Dr. Rusconi's earlier work which we posted on last June 26, 2013.

Saturday, July 6, 2013

Behind Human Error by Woods, Dekker, Cook, Johannesen and Sarter

This book* examines how errors occur in complex socio-technical systems.  The authors' thesis is that behind every ascribed “human error” there is a “second story” of the context (conditions, demands, constraints, etc.) created by the system itself.  “That which we label “human error” after the fact is never the cause of an accident.  Rather, it is the cumulative effect of multiple cognitive, collaborative, and organizational factors.” (p. 35)  In other words, “Error is a symptom indicating the need to investigate the larger operational systems and the organizational context in which it functions.” (p. 28)  This post presents a summary of the book followed by our perspective on its value.  (The book has a lot of content so this will not be a short post.)

The Second Story

This section establishes the authors' view of error and how socio-technical systems function.  They describe two mutually exclusive world views: (1) “erratic people degrade an otherwise safe system” vs. (2) “people create safety at all levels of the socio-technical system by learning and adapting . . .” (p. 6)  It should be obvious that the authors favor option 2.

In such a world “Failure, then, represents breakdowns in adaptations directed at coping with complexity.  Indeed, the enemy of safety is not the human: it is complexity.” (p. 1)  “. . . accidents emerge from the coupling and interdependence of modern systems.” (p. 31) 

Adaptation occurs in response to pressures or environmental changes.  For example, systems are under stakeholder pressure to become faster, better, cheaper; multiple goals and goal conflict are regular complex system characteristics.  But adaptation is not always successful.  There may be too little (rules and procedures are followed even though conditions have changed) or too much (adaptation is attempted with insufficient information to achieve goals).  Because of pressure, adaptations evolve toward performance boundaries, in particular, safety boundaries.  There is a drift toward failure. (see Dekker, reviewed here)

The authors present 15 premises for analyzing errors in complex socio-technical systems. (pp. 19-30)  Most are familiar but some are worth highlighting and remembering when thinking about system errors:

  • “There is a loose coupling between process and outcome.”  A “bad” process does not always produce bad outcomes and a “good” process does not always produce good outcomes.
  • “Knowledge of outcome (hindsight) biases judgments about process.”  More about that later.
  • “Lawful factors govern the types of erroneous actions or assessments to be expected.”   In other words, “errors are regular and predictable consequences of a variety of factors.”
  • “The design of artifacts affects the potential for erroneous actions and paths towards disaster.”  This is Human Factors 101 but problems still arise.  “Increased coupling increases the cognitive demands on practitioners.”  Increased coupling plus weak feedback can create a latent failure.

Complex Systems Failure


This section covers traditional mental models used for assessing failures and points out the putative inadequacies of each.  The sequence-of-events (or domino) model is familiar Newtonian causal analysis.  Man-made disaster theory puts company culture and institutional design at the heart of the safety question.  Vulnerability develops over time but is hidden by the organization’s belief that it has risk under control.  A system or component is driven into failure.  The latent failure (or Swiss cheese) model proposes that “disasters are characterized by a concatenation of several small failures and contributing events. . .” (p. 50)  While a practitioner may be closest to an accident, the associated latent failures were created by system managers, designers, maintainers or regulators.  All these models reinforce the search for human error (someone untrained, inattentive or a “bad apple) and the customary fixes (more training, procedure adherence and personal attention, or targeted discipline).  They represent a failure to adopt systems thinking and concepts of dynamics, learning, adaptation and the notion that a system can produce accidents as a natural consequence of its normal functioning.

A more sophisticated set of models is then discussed.  Perrow's normal accident theory says that “accidents are the structural and virtually inevitable product of systems that are both interactively complex and tightly coupled.” (p. 61)  Such systems structurally confuse operators and prevent them from recovering when incipient failure is discovered.  People are part of the Perrowian system and can exhibit inadequate expertise.  Control theory sees systems as composed of components that must be kept in dynamic equilibrium based on feedback and continual control inputs—basically a system dynamics view.  Accidents are a result of normal system behavior and occur when components interact to violate safety constraints and the feedback (and control inputs) do not reflect the developing problems.  Small changes in the system can lead to huge consequences elsewhere.  Accident avoidance is based on making system performance boundaries explicit and known although the goal of efficiency will tend to push operations toward the boundaries.  In contrast, the authors would argue for a different focus: making the system more resilient, i.e., error-tolerant.**  High reliability theory describes how how-hazard activities can achieve safe performance through leadership, closed systems, functional decentralization, safety culture, redundancy and systematic learning.  High reliability means minimal variations in performance, which in the short-term, means safe performance but HROs are subject to incidents indicative of residual system noise and unseen changes from social forces, information management or new technologies. (See Weick, reviewed here)

Standing on the shoulders of the above sophisticated models, resilience engineering (RE) is proposed as a better way to think about safety.  According to this model, accidents “represent the breakdowns in the adaptations necessary to cope with the real world complexity. (p. 83)  The authors use the Columbia space shuttle disaster to illustrate patterns of failure evident in complex systems: drift toward failure, past success as reason for continued confidence, fragmented problem-solving, ignoring new evidence and intra-organizational communication breakdowns.  To oppose or compensate for these patterns, RE proposes monitoring or enhancing other system properties including: buffering capacity, flexibility, margin and tolerance (which means replacing quick collapse with graceful degradation).  RE “focuses on what sustains or erodes the adaptive capacities of human-technical systems in a changing environment.” (p. 93)  In practice, that means detecting signs of increasing risk, having resources for safety available, and recognizing when and where to invest to offset risk.  It also requires focusing on organizational decision making, e.g., cross checks for risky decisions, the safety-production-efficiency balance and the reporting and disposition of safety concerns.  “Enhancing error tolerance, detection and recovery together produce safety.” (p. 26)

Operating at the Sharp End

An organization's sharp end is where practitioners apply their expertise in an effort to achieve the organization's goals.  The blunt end is where support functions, from administration to engineering, work.  The blunt end designs the system, the sharp end operates it.  Practitioner performance is affected by cognitive activities in three areas: activation of knowledge, the flow of attention and interactions among multiple goals.

The knowledge available to practitioners arrives as organized content.  Challenges include: organization may be poor, the content may be incomplete or simply wrong.  Practitioner mental models may be inaccurate or incomplete without the practitioners realizing it, i.e., they may be poorly calibrated.  Knowledge may be inert, i.e., not accessed when it is needed.  Oversimplifications (heuristics) may work in some situations but produce errors in others and limit the practitioner's ability to account for uncertainties or conflicts that arise in individual cases.  The discussion of heuristics suggests Hollnagel, reviewed here.

Mindset is about attention and its control.” (p. 114)  Attention is a limited resource.  Problems with maintaining effective attention include loss of situational awareness, in which the practitioner's mental model of events doesn't match the real world, and fixation, where the practitioner's initial assessment of  a situation creates a going-forward bias against accepting discrepant data and a failure to trigger relevant inert knowledge.  Mindset seems similar to HRO mindfulness. (see Weick)

Goal conflict can arise from many sources including management policies, regulatory requirements, economic (cost) factors and risk of legal liability.  Decision making must consider goals (which may be implicit), values, costs and risks—which may be uncertain.  Normalization of deviance is a constant threat.  Decision makers may be held responsible for achieving a goal but lack the authority to do so.  The conflict between cost and safety may be subtle or unrecognized.  “Safety is not a concrete entity and the argument that one should always choose the safest path misrepresents the dilemmas that confront the practitioner.” (p. 139)  “[I]t is difficult for many organizations (particularly in regulated industries) to admit that goal conflicts and tradeoff decisions arise.” (p. 139)  Overall, the authors present a good discussion of goal conflict.

How Design Can Induce Error


The design of computerized devices intended to help practitioners can instead lead to greater risks of errors and incidents.  Specific causes of problems include clumsy automation, limited information visibility and mode errors. 

Automation is supposed to increase user effectiveness and efficiency.  However, clumsy automation creates situations where the user loses track of what the computer is set up to do, what it's doing and what it will do next.  If support systems are so flexible that users can't know all their possible configurations, they adopt simplifying strategies which may be inappropriate in some cases.  Clumsy automation leads to more (instead of less) cognitive work, user attention is diverted to the machine instead of the task, increased potential for new kinds of errors and the need for new user knowledge and judgments.  The machine effectively has its own model of the world, based on user inputs, data sensors and internal functioning, and passes that back to the user.

Machines often hide a mass of data behind a narrow keyhole of visibility into the system.  Successful design creates “a visible conceptual space meaningfully related to activities and constraints in a field of practice.” (p. 162)  In addition, “Effective representations highlight  'operationally interesting' changes for sequences of behavior . . .” (p. 167)  However, default displays typically do not make interesting events directly visible.

Mode errors occurs when an operator initiates an action that would be appropriate if the machine were in mode A but, in fact, it's in mode B.  (This may be a man-machine problem but it's not the machine's fault.)  A machine can change modes based on situational and system factors in addition to operator input.  Operators have to maintain mode awareness, not an easy task when viewing a small, cluttered display that may not highlight current mode or mode changes.

To cope with bad design “practitioners adapt information technology provided for them to the immediate tasks at hand in a locally pragmatic way, . . .” (p. 191)  They use system tailoring where they adapt the device, often by focusing on a feature set they consider useful and ignoring other machine capabilities.  They use task tailoring where they adapt strategies to accommodate constraints imposed by the new technology.  Both types of adaptation can lead to success or eventual failures. 

The authors suggest various countermeasures and design changes to address these problems. 

Reactions to Failure

Different approaches for analyzing accidents lead to different perspectives on human error. 

Hindsight bias is “the tendency for people to 'consistently exaggerate what could have been anticipated in foresight.'” (p. 15)  It reinforces the tendency to look for the human in the human error.  Operators are blamed for bad outcomes because they are available, tracking back to multiple contributing causes is difficult, most system performance is good and investigators tend to judge process quality by its outcome.  Outsiders tend to think operators knew more about their situation than they actually did.  Evaluating process instead of outcome is also problematic.  Process and outcome are loosely coupled and what standards should be used for process evaluation?  Formal work descriptions “underestimate the dilemmas, interactions between constraints, goal conflicts, and tradeoffs present in the actual workplace.” (p. 208)  A suggested alternative approach is to ask what other practitioners would have done in the same situation and build a set of contrast cases.  “What we should not do, . . . is rely on putatively objective external evaluations . . . such as . . . court cases or other formal hearings.  Such processes in fact institutionalize and legitimate the hindsight bias . . . leading to blame and a focus on individual actors at the expense of a system view.” (pp. 213-214)

Distancing through differencing is another risk.  In this practice, reviewers focus on differences between the context surrounding an accident and their own circumstance.  Blaming individuals reinforces belief that there are no lessons to be learned for other organizations.  If human error is local and individual (as opposed to systemic) then sanctions, exhortations to follow the procedures and remedial training are sufficient fixes.  There is a decent discussion of TMI here, where, in the authors' opinion, the initial sense of fundamental surprise and need for socio-technical fixes was soon replaced by a search for local, technologically-focused solutions.
      
There is often pressure to hold people accountable after incidents or accidents.  One answer is a “just culture” which views incidents as system learning opportunities but also draws a line between acceptable and unacceptable behavior.  Since the “line” is an attribution the key question for any organization is who gets to draw it.  Another challenge is defining the discretionary space where individuals alone have the authority to decide how to proceed.  There is more on just culture but this is all (or mostly) Dekker. (see our Just Culture commentary here)

The authors' recommendations for analyzing errors and improving safety can be summed up as follows: recognize that human error is an attribution; pursue second stories that reveal the multiple, systemic contributors to failure; avoid hindsight bias; understand how work really gets done; search for systemic vulnerabilities; study how practice creates safety; search for underlying patterns; examine how change will produce new vulnerabilities; use technology to enhance human expertise; and tame complexity. (p. 239)  “Safety is created at the sharp end as practitioners interact with hazardous processes . . . using the available tools and resources.” (p. 243)

Our Perspective

This is a book about organizational characteristics and socio-technical systems.  Recommendations and advice are aimed at organizational policy makers and incident investigators.  The discussion of a “just culture” is the only time culture is discussed in detail although safety culture is mentioned in passing in the HRO write-up.

Our first problem with the book is repeatedly referring to medicine, aviation, aircraft carrier operations and nuclear power plants as complex systems.***  Although medicine is definitely complex and aviation (including air traffic control) possibly is, carrier operations and nuclear power plants are simply complicated.  While carrier and nuclear personnel have to make some adaptations on the fly, they do not face sudden, disruptive changes in their technologies or operating environments and they are not exposed to cutthroat competition.  Their operations are tightly coordinated but, where possible, by design more loosely coupled to facilitate recovery if operations start to go sour.  In addition, calling nuclear power operations complex perpetuates the myth that nuclear is “unique and special” and thus merits some special place in the pantheon of industry.  It isn't and it doesn't.

Our second problem relates to the authors' recasting of the nature of human error.  We decry the rush to judgment after negative events, particularly a search limited to identifying culpable humans.  The search for bad apples or outright criminals satisfies society's perceived need to bring someone to justice and the corporate system's desire to appear to fix things through management exhortations and training without really admitting systemic problems or changing anything substantive, e.g., the management incentive plan.  The authors' plea for more systemic analysis is thus welcome.

But they push the pendulum too far in the opposite direction.  They appear to advocate replacing all human errors (except for gross negligence, willful violations or sabotage) with systemic explanations, aka rationalizations.  What is never mentioned is that medical errors lead to tens of thousands of preventable deaths per year.****  In contrast, U.S. commercial aviation has not experienced over a hundred fatalities (excluding 9/11) since 1996; carriers and nuclear power plants experience accidents, but there are few fatalities.  At worst, this book is a denial that real human errors (including bad decisions, slip ups, impairments, coverups) occur and a rationalization of medical mistakes caused by arrogance, incompetence, class structure and lack of accountability.

This is a dense book, 250 pages of small print, with an index that is nearly useless.  Pressures (most likely cost and schedule) have apparently pushed publishing to the system boundary for copy editing—there are extra, missing and wrong words throughout the text.

This 2010 second edition updates the original 1994 monograph.  Many of the original ideas have been fleshed out elsewhere by the authors (primarily Dekker) and others.  Some references, e.g., Hollnagel, Perrow and the HRO school, should be read in their original form. 


*  D.D. Woods, S. Dekker, R. Cook, L. Johannesen and N. Sarter, Behind Human Error, 2d ed.  (Ashgate, Burlington, VT: 2010).  Thanks to Bill Mullins for bringing this book to our attention.

**  There is considerable overlap of the perspectives of the authors and the control theorists (Leveson and Rasmussen are cited in the book).  As an aside, Dekker was a dissertation advisor for one of Leveson's MIT students.

***  The authors' different backgrounds contribute to this mash-up.  Cook is a physician, Dekker is a pilot and some of Woods' cited publications refer to nuclear power (and aviation).

****  M. Makary, “How to Stop Hospitals From Killing Us,” Wall Street Journal online (Sept. 21, 2012).  Retrieved July 4, 2013.

Wednesday, May 8, 2013

Safety Management and Competitiveness

Jean-Marie Rousseau
We recently came across a paper that should be of significant interest to nuclear safety decision makers.  “Safety Management in a Competitiveness Context” was presented in March 2008 by Jean-Marie Rousseau of the Institut de Radioprotection et de Surete Nucleaire (IRSN).  As the title suggests the paper examines the effects of competitive pressures on a variety of nuclear safety management issues including decision making and the priority accorded safety.  Not surprisingly:

“The trend to ignore or to deny this phenomenon is frequently observed in modern companies.” (p. 7)

The results presented in the paper came about from a safety assessment performed by IRSN to examine safety management of EDF [Electricite de France] reactors including:

“How real is the ‘priority given to safety’ in the daily arbitrations made at all nuclear power plants, particularly with respect to the other operating requirements such as costs, production, and radiation protection or environmental constraints?” (p. 2)

The pertinence is clear as “priority given to safety” is the linchpin of safety culture policy and expected behaviors.  In addition the assessment focused on decision-making processes at both the strategic and operational levels.  As we have argued, decisions can provide significant insights into how safety culture is operationalized by nuclear plant management. 

Rousseau views nuclear operations as a “highly complex socio-technical system” and his paper provides a brief review of historical data where accidents or near misses displayed indications of the impact of competing priorities on safety.  The author notes that competitiveness is necessary just as is safety and as such it represents another risk that must be managed at organizational and managerial levels.  This characterization is intriguing and merits further reflection particularly by regulators in their pursuit of “risk informed regulation”.  Nominally regulators apply a conceptualization of risk that is hardware and natural phenomena centric.  But safety culture and competitive pressures also could be justified as risks to assuring safety - in fact much more dynamic risks - and thus be part of the framework of risk informed regulation.*  Often, as is the case with this paper, there is some tendency to assert that achievement of safety is coincident with overall performance excellence - which in a broad sense it is - but notwithstanding there are many instances where there is considerable tension - and potential risk.

Perhaps most intriguing in the assessment is the evaluation of EDF’s a posteriori analyses of its decision making processes as another dimension of experience feedback.**   We quote the paper at length:

“The study has pointed out that the OSD***, as a feedback experience tool, provides a priori a strong pedagogic framework for the licensee. It offers a context to organize debates about safety and to share safety representations between actors, illustrated by a real problematic situation. It has to be noticed that it is the only tool dedicated to “monitor” the safety/competitiveness relationship.

"But the fundamental position of this tool (“not to make judgment about the decision-maker”) is too restrictive and often becomes “not to analyze the decision”, in terms of results and effects on the given situation.

"As the existence of such a tool is judged positively, it is necessary to improve it towards two main directions:
- To understand the factors favouring the quality of a decision-making process. To this end, it is necessary to take into account the decision context elements such as time pressure, fatigue of actors, availability of supports, difficulties in identifying safety requirements, etc.
- To understand why a “qualitative decision-making process” does not always produce a “right decision”. To this end, it is necessary to analyze the decision itself with the results it produces and the effects it has on the situation.” (p. 8)

We feel this is a very important aspect that currently receives insufficient attention.  Decisions can provide a laboratory of safety management performance and safety culture actualization.  But how often are decisions adequately documented, preserved, critiqued and shared within the organization?  Decisions that yield a bad (reportable) result may receive scrutiny internally and by regulators but our studies indicate there is rarely sufficient forensic analysis - cause analyses are almost always one dimensional and hardware and process oriented.  Decisions with benign outcomes - whether the result of “good” decision making or not - are rarely preserved or assessed.  The potential benefits of detailed consideration of decisions have been demonstrated in many of the independent assessments of accidents (Challenger, Columbia, BP Texas Oil Refinery, etc.) and in research by Perin and others. 

We would go a step further than proposed enhancements to the OSD.  As Rousseau notes there are downsides to the routine post-hoc scrutiny of actual decisions - for one it will likely identify management errors even in the absence of a bad decision outcome.  This would be one more pressure on managers already challenged by a highly complex decision environment.  An alternative is to provide managers the opportunity to “practice” making decisions in an environment that supports learning and dialogue on achieving the proper balances in decisions - in other words in a safety management simulator.  The industry requires licensed operators to practice operations decisions on a simulator for similar reasons - why not nuclear managers charged with making safety decisions?



*  As the IAEA has noted, “A danger of concentrating too much on a quantitative risk value that has been generated by a PSA [probabilistic safety analysis] is that...a well-designed plant can be operated in a less safe manner due to poor safety management by the operator.”  IAEA-TECDOC-1436, Risk Informed Regulation of Nuclear Facilities: Overview of the Current Status, February 2005.

**  EDF implemented safety-availability-Radiation-Protection-environment observatories (SAREOs) to increase awareness of the arbitration between safety and other performance factors. SAREOs analyze in each station the quality of the decision-making process and propose actions to improve it and to guarantee compliance with rules in any circumstances [“Nuclear Safety: our overriding priority” EDF Group‟s file responding to FTSE4Good nuclear criteria] 


***  Per Rousseau, “The OSD (Observatory for Safety/Availability) is one of the “safety management levers” implemented by EDF in 1997. Its objective is to perform retrospective analyses of high-stake decisions, in order to improve decision-making processes.” (p. 7)

Thursday, December 20, 2012

The Logic of Failure by Dietrich Dörner

This book was mentioned in a nuclear safety discussion forum so we figured this is a good time to revisit Dörner's 1989 tome.* Below we provide a summary of the book followed by our assessment of how it fits into our interest in decision making and the use of simulations in training.

Dörner's work focuses on why people fail to make good decisions when faced with problems and challenges. In particular, he is interested in the psychological needs and coping mechanisms people exhibit. His primary research method is observing test subjects interact with simulation models of physical sub-worlds, e.g., a malfunctioning refrigeration unit, an African tribe of subsistence farmers and herdsmen, or a small English manufacturing city. He applies his lessons learned to real situations, e.g, the Chernobyl nuclear plant accident.

He proposes a multi-step process for improving decision making in complicated situations then describes each step in detail and the problems people can create for themselves while executing the step. These problems generally consist of tactics people adopt to preserve their sense of competence and control at the expense of successfully achieving overall objectives. Although the steps are discussed in series, he recognizes that, at any point, one may have to loop back through a previous step.

Goal setting

Goals should be concrete and specific to guide future steps. The relationships between and among goals should be specified, including dependencies, conflicts and relative importance. When people don't to do this, they can become distracted by obvious or unimportant (although potentially achievable) goals, or peripheral issues they know how to address rather than important issues that should be resolved. Facing performance failure, they may attempt to turn failure into success with doublespeak or blame unseen forces.

Formulate models and gather information

Good decision-making requires an adequate mental model of the system being studied—the variables that comprise the system and the functional relationships among them, which may include positive and negative feedback loops. The model's level of detail should be sufficient to understand the interrelationships among the variables the decision maker wants to influence. Unsuccessful test subjects were inclined to use a “reductive hypothesis,” which unreasonably reduces the model to a single key variable, or overgeneralization.

Information gathered is almost always incomplete and the decision maker has to decide when he has enough to proceed. The more successful test subjects asked more questions and made fewer decisions (then the less successful subjects) in the early time periods of the sim.

Predict and extrapolate

Once a model is formulated, the decision maker must attempt to determine how the values of variables will change over time in response to his decisions or internal system dynamics. One problem is predicting that outputs will change in a linear fashion, even as the evidence grows for a non-linear, e.g., exponential function. An exponential variable may suddenly grow dramatically then equally suddenly reverse course when the limits on growth (resources) are reached. Internal time delays mean that the effects of a decision are not visible until some time in the future. Faced with poor results, unsuccessful test subjects implement or exhibit “massive countermeasures, ad hoc hypotheses that ignore the actual data, underestimations of growth processes, panic reactions, and ineffectual frenetic activity.” (p. 152) Successful subjects made an effort to understand the system's dynamics, kept notes (history) on system performance and tried to anticipate what would happen in the future.

Plan and execute actions, check results and adjust strategy

The essence of planning is to think through the consequences of certain actions and see whether those actions will bring us closer to our desired goal.” (p. 153) Easier said than done in an environment of too many alternative courses of action and too little time. In rapidly evolving situations, it may be best to create rough plans and delegate as many implementing decisions as possible to subordinates. A major risk is thinking that planning has been so complete than the unexpected cannot occur. A related risk is the reflexive use of historically successful strategies. “As at Chernobyl, certain actions carried out frequently in the past, yielding only the positive consequences of time and effort saved and incurring no negative consequences, acquire the status of an (automatically applied) ritual and can contribute to catastrophe.” (p. 172)

In the sims, unsuccessful test subjects often exhibited “ballistic” behavior—they implemented decisions but paid no attention to, i.e, did not learn from, the results. Successful subjects watched for the effects of their decisions, made adjustments and learned from their mistakes.

Dörner identified several characteristics of people who tended to end up in a failure situation. They failed to formulate their goals, didn't recognize goal conflict or set priorities, and didn't correct their errors. (p. 185) Their ignorance of interrelationships among system variables and the longer-term repercussions of current decisions set the stage for ultimate failure.

Assessment

Dörner's insights and models have informed our thinking about human decision-making behavior in demanding, complicated situations. His use and promotion of simulation models as learning tools was one starting point for Bob Cudlin's work in developing a nuclear management training simulation program. Like Dörner, we see simulation as a powerful tool to “observe and record the background of planning, decision making, and evaluation processes that are usually hidden.” (pp. 9-10)

However, this book does not cover the entire scope of our interests. Dörner is a psychologist interested in individuals, group behavior is beyond his range. He alludes to normalization of deviance but his references appear limited to the flaunting of safety rules rather than a more pervasive process of slippage. More importantly, he does not address behavior that arises from the system itself, in particular adaptive behavior as an open system reacts to and interacts with its environment.

From our view, Dörner's suggestions may help the individual decision maker avoid common pitfalls and achieve locally optimum answers. On the downside, following Dörner's prescription might lead the decision maker to an unjustified confidence in his overall system management abilities. In a truly complex system, no one knows how the entire assemblage works. It's sobering to note that even in Dörner's closed,** relatively simple models many test subjects still had a hard time developing a reasonable mental model, and some failed completely.

This book is easy to read and Dörner's insights into the psychological traps that limit human decision making effectiveness remain useful.


* D. Dörner, The Logic of Failure: Recognizing and Avoiding Error in Complex Situations, trans. R. and R. Kimber (Reading, MA: Perseus Books, 1998). Originally published in German in 1989.

** One simulation model had an external input.

Wednesday, December 12, 2012

“Overpursuit” of Goals

We return to a favorite subject, the impact of goals and incentives on safety culture and performance. Interestingly this subject comes up in an essay by Oliver Burkeman, “The Power of Negative Thinking,”* which may seem unusual as most people think of goals and achievement of goals as the product of a positive approach. Traditional business thinking is to set hard, quantitative goals, the bigger the better. But futures are inherently uncertain and goals generally are not so. The counter intuitive argument suggests the most effective way to address future performance is to focus on worst case outcomes. Burkeman observes that “...rigid goals may encourage employees to cut ethical corners” and “Focusing on one goal at the expense of all other factors also can distort a corporate mission or an individual life…” and result in “...the ‘overpursuit’ of goals…” Case in point, yellow jerseys.

This raises some interesting points for nuclear safety. First we would remind our readers of Snowden’s Cynefin decision context framework, specifically his “complex” space which is indicative of where nuclear safety decisions reside. In this environment there are many interacting causes and effects, making it difficult or impossible to pursue specific goals along defined paths. Clearly an uncertain landscape. As Simon French argues: “Decision support will be more focused on exploring judgement and issues, and on developing broad strategies that are flexible enough to accommodate changes as the situation evolves.”** This would suggest the pursuit of specific, aspirational goals may be misguided or counterproductive.

Second, safety performance goals are hard to identify anyway. Is it the absence of bad outcomes? Or the maintenance of, say, a “strong” safety culture - whatever that is. One indication of the elusiveness of safety goals is their absence as targets in incentive programs. So there is probably little likelihood of overemphasizing safety performance as a goal. But is the same true for operational type goals such as capacity factor, refuel outage durations, and production costs? Can an overly strong focus on such short term goals, often associated with stretching performance, lead to overpursuit? What if large financial incentives are attached to the achievement of the goals?

The answer is not: “Safety is our highest priority”. More likely it is an approach that considers the complexity and uncertainty of nuclear operating space and the potential for hard goals to cut both ways. It might value how a management team prosecutes its responsibilities more than the outcome itself.


* O. Burkeman, “The Power of Negative Thinking,” Wall Street Journal online (Dec. 7, 2012).

** S. French, “Cynefin: repeatability, science and values,” Newsletter of the European Working Group “Multiple Criteria Decision Aiding,” series 3, no. 17 (Spring 2008) p. 2. We posted on Cynefin and French's paper here.

Wednesday, December 5, 2012

Drift Into Failure by Sydney Dekker

Sydney Dekker's Drift Into Failure* is a noteworthy effort to provide new insights into how accidents and other bad outcomes occur in large organizations. He begins by describing two competing world views, the essentially mechanical view of the world spawned by Newton and Descartes (among others), and a view based on complexity in socio-technical organizations and a systems approach. He shows how each world view biases the search for the “truth” behind how accidents and incidents occur.

Newtonian-Cartesian (N-C) Vision

Issac Newton and Rene Descartes were leading thinkers during the dawn of the Age of Reason. Newton used the language of mathematics to describe the world while Descartes relied on the inner process of reason. Both believed there was a single reality that could be investigated, understood and explained through careful analysis and thought—complete knowledge was possible if investigators looked long and hard enough. The assumptions and rules that started with them, and were extended by others over time, have been passed on and most of us accept them, uncritically, as common sense, the most effective way to look at the world.

The N-C world is ruled by invariant cause-and-effect; it is, in fact, a machine. If something bad happens, then there was a unique cause or set of causes. Investigators search for these broken components, which could be physical or human. It is assumed that a clear line exists between the broken part(s) and the overall behavior of the system. The explicit assumption of determinism leads to an implicit assumption of time reversibility—because system performance can be predicted from time A if we know the starting conditions and the functional relationships of all components, then we can start from a later time B (the bad outcome) and work back to the true causes. (p. 84) Root cause analysis and criminal investigations are steeped in this world view.

In this view, decision makers are expected to be rational people who “make decisions by systematically and consciously weighing all possible outcomes along all relevant criteria.” (p. 3) Bad outcomes are caused by incompetent or worse, corrupt decision makers. Fixes include more communications, training, procedures, supervision, exhortations to try harder and criminal charges.

Dekker credits Newton et al for giving man the wherewithal to probe Nature's secrets and build amazing machines. However, Newtonian-Cartesian vision is not the only way to view the world, especially the world of complex, socio-technical systems. For that a new model, with different concepts and operating principles, is required.

The Complex System

Characteristics

The sheer number of parts does not make a system complex, only complicated. A truly complex system is open (it interacts with its environment), has components that act locally and don't know the full effects of their actions, is constantly making decisions to maintain performance and adapt to changing circumstances, and has non-linear interactions (small events can cause large results) because of multipliers and feedback loops. Complexity is a result of the ever-changing relationships between components. (pp.138-144)

Adding to the myriad information confronting a manager or observer, system performance is often optimized at the edge of chaos, where competitors are perpetually vying for relative advantage at an affordable cost.** The system is constantly balancing its efforts between exploration (which will definitely incur costs but may lead to new advantages) and exploitation (which reaps benefits of current advantages but will likely dissipate over time). (pp. 164-165)

The most important feature of a complex system is that it adapts to its environment over time in order to survive. And its environment is characterized by resource scarcity and competition. There is continuous pressure to maintain production and increase efficiency (and their visible artifacts: output, costs, profits, market share, etc) and less visible outputs, e.g., safety, will receive less attention. After all, “Though safety is a (stated) priority, operational systems do not exist to be safe. They exist to provide a service or product . . . .” (p. 99) And the cumulative effect of multiple adaptive decisions can be an erosion of safety margins and a changed response of the entire system. Such responses may be beneficial or harmful—a drift into failure.

Drift by a complex system exhibits several characteristics. First, as mentioned above, it is driven by environmental factors. Second, drift occurs in small steps so changes can be hardly noticed, and even applauded if they result in local performance improvement; “. . . successful outcomes keep giving the impression that risk is under control” (p. 106) as a series of small decisions whittle away at safety margins. Third, these complex systems contain unruly technology (think deepwater drilling) where uncertainties exist about how the technology may be ultimately deployed and how it may fail. Fourth, there is significant interaction with a key environmental player, the regulator, and regulatory capture can occur, resulting in toothless oversight.

“Drifting into failure is not so much about breakdowns or malfunctioning of components, as it is about an organization not adapting effectively to cope with the complexity of its own structure and environment.” (p. 121) Drift and occasionally accidents occur because of ordinary system functioning, normal people going about their regular activities making ordinary decisions “against a background of uncertain technology and imperfect information.” Accidents, like safety, can be viewed as an emergent system property, i.e., they are the result of system relationships but cannot be predicted by examining any particular system component.

Managers' roles

Managers should not try to transform complex organizations into merely complicated ones, even if it's possible. Complexity is necessary for long-term survival as it maximizes organizational adaptability. The question is how to manage in a complex system. One key is increasing the diversity of personnel in the organization. More diversity means less group think and more creativity and greater capacity for adaptation. In practice, this means validation of minority opinions and encouragement of dissent, reflecting on the small decisions as they are made, stopping to ponder why some technical feature or process is not working exactly as expected and creating slack to reduce the chances of small events snowballing into large failures. With proper guidance, organizations can drift their way to success.

Accountability

Amoral and criminal behavior certainly exist in large organizations but bad outcomes can also result from normal system functioning. That's why the search for culprits (bad actors or broken parts) may not always be appropriate or adequate. This is a point Dekker has explored before, in Just Culture (briefly reviewed here) where he suggests using accountability as a means to understand the system-based contributors to failure and resolve those contributors in a manner that will avoid recurrence.

Application to Nuclear Safety Culture

A commercial nuclear power plant or fleet is probably not a complete complex system. It interacts with environmental factors but in limited ways; it's certainly not directly exposed to the Wild West competition of say, the cell phone industry. Group think and normalization of deviance*** is a constant threat. The technology is reasonably well-understood but changes, e.g., uprates based on more software-intensive instrumentation and control, may be invisibly sanding away safety margin. Both the industry and the regulator would deny regulatory capture has occurred but an outside observer may think the relationship is a little too cozy. Overall, the fit is sufficiently good that students of safety culture should pay close attention to Dekker's observations.

In contrast, the Hanford Waste Treatment Plant (Vit Plant) is almost certainly a complex system and this book should be required reading for all managers in that program.

Conclusion

Drift Into Failure is not a quick read. Dekker spends a lot of time developing his theory, then circling back to further explain it or emphasize individual pieces. He reviews incidents (airplane crashes, a medical error resulting in patient death, software problems, public water supply contamination) and descriptions of organization evolution (NASA, international drug smuggling, “conflict minerals” in Africa, drilling for oil, terrorist tactics, Enron) to illustrate how his approach results in broader and arguably more meaningful insights than the reports of official investigations. Standing on the shoulders of others, especially Diane Vaughan, Dekker gives us a rich model for what might be called the “banality of normalization of deviance.” 


* S. Dekker, Drift Into Failure: From Hunting Broken Components to Understanding Complex Systems (Burlington VT: Ashgate 2011).

** See our Sept. 4, 2012 post onCynefin for another description of how the decisions an organization faces can suddenly slip from the Simple space to the Chaotic space.

*** We have posted many times about normalization of deviance, the corrosive organizational process by which the yesterday's “unacceptable” becomes today's “good enough.”

Tuesday, July 31, 2012

Regulatory Influence on Safety Culture

In September, 2011 the Nuclear Energy Agency (NEA) and the International Atomic Energy (IAEA) held a workshop for regulators and industry on oversight of licensee management.  “The principal aim of the workshop was to share experience and learning about the methods and approaches used by regulators to maintain oversight of, and influence, nuclear licensee leadership and management for safety, including safety culture.”*

Representatives from several countries made presentations.  For example, the U.S. presentation by NRC’s Valerie Barnes and INPO’s Ken Koves discussed work to define safety culture (SC) traits and correlate them to INPO principles and ROP findings (we previously reviewed this effort here).  Most other presentations also covered familiar territory. 

However, we were very impressed by Prof. Richard Taylor’s keynote address.  He is from the University of Bristol and has studied organizational and cultural factors in disasters and near-misses in both nuclear and non-nuclear contexts.  His list of common contributors includes issues with leadership, attitudes, environmental factors, competence, risk assessment, oversight, organizational learning and regulation.  He expounded on each factor with examples and additional detail. 

We found his conclusion most encouraging:  “Given the common precursors, we need to deepen our understanding of the complexity and interconnectedness of the socio-political systems at the root of organisational accidents.”  He suggests using system dynamics modeling to study archetypes including “maintaining visible convincing leadership commitment in the presence of commercial pressures.”  This is totally congruent with the approach we have been advocating for examining the effects of competing business and safety pressures on management. 

Unfortunately, this was the intellectual high point of the proceedings.  Topics that we believe are important to assessing and understanding SC got short shrift thereafter.  In particular, goal conflict, CAP and management compensation were not mentioned by any of the other presenters.

Decision-making was mentioned by a few presenters but there was no substantive discussion of this topic (the U.K. presenter had a motherhood statement that “Decisions at all levels that affect safety should be rational, objective, transparent and prudent”; the Barnes/Kove presentation appeared to focus on operational decision making).  A bright spot was in the meeting summary where better insight into licensees’ decision making process was mentioned as desirable and necessary by regulators.  And one suggestion for future research was “decision making in the face of competing goals.”  Perhaps there is hope after all.

(If this post seems familiar, last Dec 5 we reported on a Feb 2011 IAEA conference for regulators and industry that covered some of the same ground.  Seven months later the bureaucrats had inched the football a bit down the field.)


*  Proceedings of an NEA/IAEA Workshop, Chester, U.K. 26-28 Sept 2011, “Oversight and Influencing of Licensee Leadership and Management for Safety, Including Safety Culture – Regulatory Approaches and Methods,” NEA/CSNI/R(2012)13 (June 2012).

Friday, July 27, 2012

Modeling Safety Culture (Part 4): Simulation Results 2


As we introduced in our prior post on this subject (Results 1), we are presenting some safety culture simulation results based on a highly simplified model.  In that post we illustrated how management might react to business pressure caused by a reduction in authorized budget dollars.  The actions of management result in shifting of resources from safety to business and lead to changes in the state of safety culture.

In this post we continue with the same model and some other interesting scenarios.  In each of the following charts three outputs are plotted: safety culture in red, management action level in blue and business pressure in dark green.  The situation is an organization with a somewhat lower initial safety culture and confronted with a somewhat smaller budget reduction than the example in Results 1. 

Figure 1
Figure 1 shows an overly reactive management. The blue line shows management’s actions in response to the changes in business pressure (green) associated with the budget change.  Note that management’s actions are reactive, shifting priorities immediately and directly in response. The behavior leads to a cyclic outcome where management actions temporarily alleviate business pressure, but when actions are relaxed, pressure rises again, followed by another cycle of management response.  This could be a situation where management is not addressing the source of the problem, shifting priorities back and forth between business and safety.  Also of interest is that the magnitude of the cycle is actually increasing with time indicating that the system is essentially unstable and unsustainable.  Safety culture (red) declines throughout the time frame.

Figure 2
Figure 2 shows the identical conditions but where management implements a more restrained approach, delaying its response to changes in business.  The overall system response is still cyclic, but now the magnitude of the cycles is decreasing and converging on a stable outcome.






Figure 3
Figure 3 is for the same conditions, but the management response is restrained further.  Management takes more time to assess the situation and respond to business pressure conditions.  This approach starts to filter out the cyclic type of response seen in the first two examples and will eventually result in a lower business gap.

Perhaps the most important takeaway from these three simulations is that the total changes in safety culture are not significantly different.  A certain price is being paid for shifting priorities away from safety, however the ability to reduce and maintain lower business pressure is much better with the last management strategy.

Figure 4
The last example in this set is shown in Figure 4.  This is a situation where business pressure is gradually ramped up due to a series of small step reductions in budget levels.  Within the simulation we have also set a limit on extent of management actions.  Initially management takes no action to shift priorities - business pressure is within a value that safety culture can resist.  Consequently safety culture remains stable.  After the third “bump” in business pressure, the threshold resistance of safety culture is broken and management starts to modestly shift priorities.  Even though business pressure continues to ramp up, management response is capped and does not “chase” closing the business gap.  As a result safety culture suffers only a modest reduction before stabilizing.  This scenario may be more typical of an organization with a fairly strong safety culture - under sufficient pressure it will make modest tradeoffs in priorities but will resist a significant compromise in safety.

Friday, July 20, 2012

Cognitive Dissonance at Palisades

“Cognitive dissonance” is the tension that arises from holding two conflicting thoughts in one’s mind at the same time.  Here’s a candidate example, a single brief document that presents two different perspectives on safety culture issues at Palisades.

On June 26, 2012, the NRC requested information on Palisades’ safety culture issues, including the results of a 2012 safety culture assessment conducted by an outside firm, Conger & Elsea, Inc (CEI).  In reply, on July 9, 2012 Entergy submitted a cover letter and the executive summary of the CEI assessment.*  The cover letter says “Areas for Improvement (AFls) identified by CEI over1apped many of the issues already identified by station and corporate leadership in the Performance Recovery Plan. Because station and corporate management were implementing the Performance Recovery Plan in April 2012, many of the actions needed to address the nuclear safety culture assessment were already under way.”

Further, “Gaps identified between the station Performance Recovery Plan and the safety culture assessment are being addressed in a Safety Culture Action Plan. . . . [which is] a living document and a foundation for actively engaging station workers to identify, create and complete other actions deemed to be necessary to improve the nuclear safety culture at PNP.”

Seems like management has matters in hand.  But let’s look at some of the issues identified in the CEI assessment.

“. . . important decision making processes are governed by corporate procedures. . . .  However, several events have occurred in recent Palisades history in which deviation from those processes contributed to the occurrence or severity of an event.”

“. . . there is a lack of confidence and trust by the majority of employees (both staff and management) at the Plant in all levels of management to be open, to make the right decisions, and to really mean what they say. This is indicated by perceptions [of] the repeated emphasis of production over safety exhibited through decisions around resources.” [emphasis added]

“There is a lack in the belief that Palisades Management really wants problems or concerns reported or that the issues will be addressed. The way that CAP is currently being implemented is not perceived as a value added process for the Plant.”

The assessment also identifies issues related to Safety Conscious Work Environment and accountability throughout the organization.

So management is implying things are under control but the assessment identified serious issues.  As our Bob Cudlin has been explaining in his series of posts on decision making, pressures associated with goal conflict permeate an entire organization and the problems that arise cannot be fixed overnight.  In addition, there’s no reason for a plant to have an ineffective CAP but if the CAP isn’t working, that’s not going to be quickly fixed either.


*  Letter, A.J. Vitale to NRC, “Reply to Request for Information” (July 9,2012) ADAMS ML12193A111.