Showing posts with label Dekker. Show all posts
Showing posts with label Dekker. Show all posts

Wednesday, October 9, 2019

More on Mental Models in Healthcare

Source: Clipart Panda
Our August 6, 2019 post discussed the appalling incidence of preventable harm in healthcare settings.  We suggested that a better mental model of healthcare delivery could contribute to reducing the incidence of preventable harm.  It will come as no surprise to Safetymatters readers that we are referring to a systems-oriented model.

We’ll use a 2014 article* by Nancy Leveson and Sidney Dekker to describe how a systems approach can lead to better understanding of why accidents and other negative outcomes occur.  The authors begin by noting that 70-90% of industrial accidents are blamed on individual workers.**  As a consequence, proposed fixes focus on disciplining, firing, or retraining individuals or, for groups, specifying their work practices in ever greater detail (the authors call this “rigidifying” work).  This is the Safety I mental model in a nutshell, limiting its view to the “what” and “who” of incidents.   

In contrast, systems thinking posits the behavior of individuals can only be understood by examining the context in which their behavior occurs.  The context includes management decision-making and priorities, regulatory requirements and deficiencies, and of course, organizational culture, especially safety culture.  Fixes that don’t consider the overall process almost guarantee that similar problems will arise in the future.  “. . . human error is a symptom of a system that needs to be redesigned.”  Systems thinking adds the “why” to incident analysis.

Every system has a designer, although they may not be identified as such and may not even be aware they’re “designing” when they specify work steps or flows, or define support processes, e.g., procurement or quality control.  Importantly, designers deal with an ideal system, not with the actual constructed system.  The actual system may differ from the designer's original specification because of inherent process variances, the need to address unforeseen conditions, or evolution over time.  Official procedures may be incomplete, e.g., missing unlikely but possible conditions or assume that certain conditions cannot occur.  However, the people doing the work must deal with the  constructed system, however imperfect, and the conditions that actually occur.

The official procedures present a doubled-edged threat to employees.  If they adapt procedures in the face of unanticipated conditions, and the adaptation turns out to be ineffective or leads to negative outcomes, employees can be blamed for not following the procedures.  On the other hand, if they stick to the procedures when conditions suggest they should be adapted and negative outcomes occur, the employees can be blamed for too rigidly following them.

Personal blame is a major problem in System I.  “Blame is the enemy of safety . . . it creates a culture where people are afraid to report mistakes . . . A safety culture that focuses on blame will never be very effective in preventing accidents.”

Our Perspective

How does the above relate to reducing preventable harm in healthcare?  We believe that structural and cultural factors impede the application of systems thinking in the healthcare field.  It keeps them stuck in a Safety I worldview no matter how much they pretend otherwise. 

The hospital as formal bureaucracy

When we say “healthcare” we are referring to a large organization that provides medical care, a hospital is the smallest unit of analysis.  A hospital is literally a textbook example of what organizational theorists call a formal bureaucracy.  It has specialized departments with an official division of authority among them—silos are deliberately created and maintained.  An administrative hierarchy mediates among the silos and attempts to guide them toward overall goals. The organization is deliberately impersonal to avoid favoritism and behavior is prescribed, proscribed and guided by formal rules and procedures.  It appears hospitals were deliberately designed to promote System I thinking and its inherent bias for blaming the individual for negative outcomes.

Employees have two major strategies for avoiding blame: strong occupational associations and plausible deniability. 

Powerful guilds and unions 


Medical personnel are protected by their silo and tribe.  Department heads defend their employees (and their turf) from outsiders.  The doctors effectively belong to a guild that jealously guards their professional authority; the nurses and other technical fields have their unions.  These unofficial and official organizations exist to protect their members and promote their interests.  They do not exist to protect patients although they certainly tout such interest when they are pushing for increased employee headcounts.  A key cultural value is members do not rat on other members of their tribe so problems may be observed but go unreported.

Hiding behind the procedures

In this environment, the actual primary goal is to conform to the rules, not to serve clients.  The safest course for the individual employee is to follow the rules and procedures, independent of the effect this may have on a patient.  The culture espouses a value of patient safety but what gets a higher value is plausible deniability, the ability to avoid personal responsibility, i.e., blame, by hiding behind the established practices and rules when negative outcomes occur.

An enabling environment 


The environment surrounding healthcare allows them to continue providing a level of service that literally kills patients.  Data opacity means it’s very difficult to get reliable information on patient outcomes.  Hospitals with high failure rates simply claim they are stuck with or choose to serve the sickest patients.  Weak malpractice laws are promoted by the doctors’ guild and maintained by the politicians they support.  Society in general is overly tolerant of bad medical outcomes.  Some families may make a fuss when a relative dies from inadequate care but settlements are paid, non-disclosure agreements are signed, and the enterprise moves on.

Bottom line: It will take powerful forces to get the healthcare industry to adopt true systems-oriented thinking and identify the real reasons why preventive harm occurs and what corrective actions could be effective.  Healthcare claims to promote evidence-based medicine; they need to add evidence-based harm reduction strategies.  Industry-wide adoption of the aviation industry’s confidential reporting system for errors would be a big step forward.    


*  N. Leveson and S. Dekker, “Get To The Root Of Accidents,” ChemicalProcessing.com (Feb 27, 2014).  Retrieved Oct. 7, 2019.  Leveson is an MIT professor and long-standing champion of systems thinking; Dekker has written extensively on Just Culture and Safety II concepts.  Click on their respective labels to pull up our other posts on their work.

**  The article is tailored for the process industry but the same thinking can be applied to service industries.

Monday, October 29, 2018

Safety Culture: What are the Contributors to “Bad” Outcomes Versus “Good” Outcomes and Why Don’t Some Interventions Lead to Improved Safety Performance?

Why?
Sidney Dekker recently revisited* some interesting research he led at a large health care authority.  The authority’s track record was not atypical for health care: 1 out of 13 (7%) patients was hurt in the process of receiving care.  The authority investigated the problem cases and identified a familiar cluster of negative factors, including workarounds, shortcuts, violations, guidelines not followed, errors and miscalculations—the list goes on.  The interventions will also be familiar to you—identify who did what wrong, add more rules, try harder and get rid of bad apples—but were not reducing the adverse event rate.

Dekker’s team took a different perspective and looked at the 93% of patients who were not harmed.  What was going on in their cases?  To their surprise, the team found the same factors: workarounds, shortcuts, violations, guidelines not followed, errors and miscalculations, etc.** 

Dekker uses this research to highlight a key difference between the traditional view of safety management, Safety I, and the more contemporary view, Safety II.  At its heart, Safety I believes the source of problems lies with the individual so interventions focus on ways to make the individual’s work behavior more reliable, i.e., less likely to deviate from the idealized form specified by work designers.  Safety I ignores the fact that the same imperfections exist in work with both successful and problematic outcomes.

In contrast, Safety II sees the source of problems in the system, the dynamic combination of technology, environmental factors, organizational aspects, and individual cognition and choices.  Referencing the work of Diane Vaughan, Dekker says “the interior life of organizations is always messy, only partially well-coordinated and full of adaptations, nuances, sacrifices and work that is done in ways that is quite different from any idealized image of it.”

Revisiting the data revealed that the work with good outcomes was different.  This work had more positive characteristics, including diversity of professional opinion and the possibility to voice dissent, keeping the discussion on risk alive and not taking past success as a guarantee for safety, deference to proven expertise, widely held authority to say “stop,” and pride of workmanship.  As you know, these are important characteristics of a strong safety culture.

Our Perspective

Dekker’s essay is a good introduction to the differences between Safety I and Safety II thinking, most importantly their differing mental models of the way work is actually performed in organizations.  In Safety I, the root cause of imperfect results is the individual and constant efforts are necessary (e.g., training, monitoring, leadership, discipline) to create and maintain the individual’s compliance with work as designed.  In  Safety II, normal system functioning leads to mostly good and occasionally bad results.  The focus of Safety II interventions should be on activities that increase individual capacity to affect system performance and/or increase system robustness, i.e., error tolerance and an increased chance of recovery when errors occur.

If one applies Safety I thinking to a “bad” outcome then the most likely result from an effective intervention is that the exact same problem will not happen again.  This thinking sustains a robust cottage industry in root-cause analysis because new problems will always arise and no changes are made to the system itself.

We like Dekker’s (and Vaughan’s) work and have reported on it several times in Safetymatters (click on the Dekker and Vaughan labels to bring up related posts).  We have been emphasizing some of the same points, especially the need for a systems view, since we started Safetymatters almost ten years ago.

Individual Exercise: Again drawing on Vaughan, Dekker says “there is often no discernable difference between the organization that is about to have an accident or adverse event, and the one that won’t, or the one that just had one.”  Look around your organization and review your career experience; is that true?


*  S. Dekker, “Why Do Things Go Right?,” SafetyDifferently website (Sept. 28, 2018).  Retrieved Oct. 25, 2018.

**  This is actually rational.  People operate on feedback and if the shortcuts, workarounds and disregarding the guidelines did not lead to acceptable (or at least tolerable) results most of the time, folks would stop using them.

Monday, October 16, 2017

Nuclear Safety Culture: A Suggestion for Integrating “Just Culture” Concepts

All of you have heard of “Just Culture” (JC).  At heart, it is an attitude toward investigating and explaining errors that occur in organizations in terms of “why” an error occurred, including systemic reasons, rather than focusing on identifying someone to blame.  How might JC be applied in practice?  A paper* by Shem Malmquist describes how JC concepts could be used in the early phases of an investigation to mitigate cognitive bias on the part of the investigators.

The author asserts that “cognitive bias has a high probability of occurring, and becoming integrated into the investigators subconscious during the early stages of an accident investigation.” 

He recommends that, from the get-go, investigators categorize all pertinent actions that preceded the error as an error (unintentional act), at-risk behavior (intentional but for a good reason) or reckless (conscious disregard of a substantial risk or intentional rule violation). (p. 5)  For errors or at-risk actions, the investigator should analyze the system, e.g., policies, procedures, training or equipment, for deficiencies; for reckless behavior, the investigator should determine what system components, if any, broke down and allowed the behavior to occur. (p. 12).  Individuals should still be held responsible for deliberate actions that resulted in negative consequences.

Adding this step to a traditional event chain model will enrich the investigation and help keep investigators from going down the rabbit hole of following chains suggested by their own initial biases.

Because JC is added to traditional investigation techniques, Malmquist believes it might be more readily accepted than other approaches for conducting more systemic investigations, e.g., Leveson’s System Theoretic Accident Model and Processes (STAMP).  Such approaches are complex, require lots of data and implementing them can be daunting for even experienced investigators.  In our opinion, these models usually necessitate hiring model experts who may be the only ones who can interpret the ultimate findings—sort of like an ancient priest reading the entrails of a sacrificial animal.  Snide comment aside, we admire Leveson’s work and reviewed it in our Nov. 11, 2013 post.

Our Perspective

This paper is not some great new insight into accident investigation but it does describe an incremental step that could make traditional investigation methods more expansive in outlook and robust in their findings.

The paper also provides a simple introduction to the works of authors who cover JC or decision-making biases.  The former category includes Reason and Dekker and the latter one Kahneman, all of whom we have reviewed here at Safetymatters.  For Reason, see our Nov. 3, 2014 post; for Dekker, see our Aug. 3, 2009 and Dec. 5, 2012 posts; for Kahneman, see our Nov. 4, 2011 and Dec. 18, 2013 posts.

Bottom line: The parts describing and justifying the author’s proposed approach are worth reading.  You are already familiar with much of the contextual material he includes.  


*  S. Malmquist, “Just Culture Accident Model – JCAM” (June 2017).

Thursday, August 10, 2017

Nuclear Safety Culture: The Threat of Bureaucratization

We recently read Sidney Dekker’s 2014 paper* on the bureaucratization of safety in organizations.  It’s interesting because it describes a very common evolution of organizational practices, including those that affect safety, as an organization or industry becomes more complicated and formal over time.  Such evolution can affect many types of organizations, including nuclear ones.  Dekker’s paper is summarized below, followed by our perspective on it. 

The process of bureaucratization is straightforward; it involves hierarchy (creating additional layers of organizational structure), specialized roles focusing on “safety related” activities, and the application of rules for defining safety requirements and the programs to meet them.  In the safety space, the process has been driven by multiple factors, including legislation and regulation, contracting and the need for a uniform approach to managing large groups of organizations, and increased technological capabilities for collection and analysis of data.

In a nutshell, bureaucracy means greater control over the context and content of work by people who don’t actually have to perform it.  The risk is that as bureaucracy grows, technical expertise and operational experience may be held in less value.

This doesn’t mean bureaucracy is a bad thing.  In many environments, bureaucratization has led to visible benefits, primarily a reduction in harmful incidents.  But it can lead to unintended, negative consequences including:

  • Myopic focus on formal performance measures (often quantitative) and “numbers games” to achieve the metrics and, in some cases, earn financial bonuses,
  • An increasing inability to imagine, much less plan for, truly novel events because of the assumption that everything bad that might happen has already been considered in the PRA or the emergency plan.  (Of course, these analyses/documents are created by siloed specialists who may lack a complete understanding of how the socio-technical system works or what might actually be required in an emergency.  Fukushima anyone?),
  • Constraints on organizational members’ creativity and innovation, and a lack of freedom that can erode problem ownership, and
  • Interest, effort and investment in sustaining, growing and protecting the bureaucracy itself.
Our Perspective

We realize reading about bureaucracy is about as exciting as watching a frog get boiled.  However, Dekker does a good job of explaining how the process of bureaucratization takes root and grows and the benefits that can result.  He also spells out the shortcomings and unintended consequences that can accompany it.

The commercial nuclear world is not immune to this process.  Consider all the actors who have their fingers in the safety pot and realize how few of them are actually responsible for designing, maintaining or operating a plant.  Think about the NRC’s Reactor Oversight Process (ROP) and the licensees’ myopic focus on keeping a green scorecard.  Importantly, the Safety Culture Policy Statement (SCPS) being an “expectation” resists the bureaucratic imperative to over-specify.  Instead, the SCPS is an adjustable cudgel the NRC uses to tap or bludgeon wayward licensees into compliance.  Foreign interest in regulating nuclear safety culture will almost certainly lead to its increased bureaucratization.  

Bureaucratization is clearly evident in the public nuclear sector (looking at you, Department of Energy) where contractors perform the work and government overseers attempt to steer the contractors toward meeting production goals and safety standards.  As Dekker points out, managing, monitoring and controlling operations across an organizational network of contractors and sub-contractors tends to be so difficult that bureaucratized accountability becomes the accepted means to do so.

We have presented Dekker’s work before, primarily his discussion of a “just culture” (reviewed Aug. 3, 2009) that tries to learn from mishaps rather than simply isolating and perhaps punishing the human actor(s) and “drift into failure” (reviewed Dec. 5, 2012) where a socio-technical system can experience unacceptable performance caused by systemic interactions while functioning normally.  Stakeholders can mistakenly believe the system is completely safe because no errors have occurred while in reality the system can be slipping toward an incident.  Both of these attributes should be considered in your mental model of how your organization operates.

Bottom line: This is an academic paper in a somewhat scholarly journal, in other words, not a quick and easy read.  But it’s worth a look to get a sense of how the tentacles of formality can wrap themselves around an organization.  In the worse case, they can stifle the capabilities the organization needs to successfully react to unexpected events and environmental changes.


*  S.W.A. Dekker, “The bureaucratization of safety,” Safety Science 70 (2014), pp. 348–357.  We saw this paper on Safety Differently, a website that publishes essays on safety.  Most of the site’s content appears related to industries with major industrial safety challenges, e.g., mining.

Friday, January 27, 2017

Leadership, Decisions, Systems Thinking and Nuclear Safety Culture

AcciMap Excerpt
We recently read a paper* that echoes some of the themes we emphasize on Safetymatters, viz., leadership, decisions and a systems view.  Following is an excerpt from the abstract:

Leadership is progressively being recognized as a key** factor in supporting successful performance across a range of domains. . . . the decisions and actions that characterize safety leadership thus become important emergent properties in the prevention of incidents, which should be considered within the context of the broader organizational system and not merely constrained to understanding events or conditions that shape performance at the ‘sharp end’.”  [emphasis added]

The authors go on to analyze decisions and actions after a mining incident (landslide) using a combination of three different schemes: Rasmussen’s Risk Management Framework (RMF) and corresponding AcciMap, and the Critical Decision Method (CDM).

The RMF describes work systems as comprised of various levels and argues that safety performance is affected by decisions and actions at all levels from politicians in the external environment down through company executives and managers and finally to individual workers.  Rasmussen’s AcciMap is an expansive causal diagram for an accident or incident that displays the contributions (or omissions) at each level in the RMF and their connections.

CDM uses semi-structured interviews to obtain information about how individuals formulate their decisions, including context such as background knowledge and immediate influencing factors.  Consistent with the RMF, case study interviews were conducted with individuals at different organizational levels.  CDM data were used to construct the AcciMap.

We won’t go into the details of the analysis but it identified over a dozen key decisions made at different organizational levels before and during the incident; most were connected to at least one other key decision.  The AcciMap illustrates decisions and communications across multiple levels and thus provides a useful picture of how an organization anticipates and responds to an unusual situation.

Our Perspective

The authors argue, and we agree, that this type of analysis provides greater detail and insight into the performance of an organization’s safety management system than traditional accident investigations (especially those focused on finding someone to blame).

This article does not specifically discuss culture.  But the body of decisions an organization produces is the strongest evidence and most visible artifact of its culture.  Organizational decisions are far more important than responses to surveys or interviews where people can report what they believe (or hope) the culture is, or what they think their audience wants to hear.

We like that RMF and AcciMap are agnostic: they can be used to analyze either “what went wrong” or “what went right” scenarios.  (The case study was in the latter category because no one was hurt in the incident.)  If an assessor is looking at a sample of decisions to infer a nuclear organization’s culture, most of those decisions will have had positive (or at least no negative) consequences.

The authors are Australian academics but this short (8 pages total) paper is quite readable and a good introduction to CDM and Rasmussen’s constructs.  The references include people whose work we have positively reviewed on Safetymatters, including Dekker, Hollnagel, Leveson and Reason.

Bottom line: There is nothing about culture or nuclear here, but the overall message reinforces our beliefs about how to think about Nuclear Safety Culture.


*  S-L Donovana, P.M. Salmonb and M.G. LennĂ©a, “The leading edge: A systems thinking methodology for assessing safety leadership,” Procedia Manufacturing 3 (2015), pp. 6644–6651.  Available at sciencedirect.com; retrieved Jan. 19, 2017.

**  Note they do not say “one and only” or even “most important.”

Monday, April 13, 2015

Safety-I and Safety-II: The Past and Future of Safety Management by Erik Hollnagel

This book* discusses two different ways of conceptualizing safety performance problems (e.g., near-misses, incidents and accidents) and safety management in socio-technical systems.  This post describes each approach and provides our perspective on Hollnagel’s efforts.  As usual, our interest lies in the potential value new ways of thinking can offer to the nuclear industry.

Safety-I

This is the common way of looking at safety performance problems.  It is reactive, i.e., it waits for problems to arise** and analytic, e.g., it uses specific methods to work back from the problem to its root causes.  The key assumption is that something in the system has failed or malfunctioned and the purpose of an investigation is to identify the causes and correct them so the problem will not recur.  A second assumption is that chains of causes and effects are linear, i.e., it is actually feasible to start with a problem and work back to its causes.  A third assumption is that a single solution (the “first story”) can be found. (pp. 86, 175-76)***  Underlying biases include the hindsight bias (p. 176) and the belief that the human is usually the weak link. (pp. 78-79)  The focus of safety management is minimizing the number of things that go wrong.

Our treatment of Safety-I is brief because we have reported on criticism of linear thinking/models elsewhere, primarily in the work of Dekker, Woods et al, and Leveson.  See our posts of Dec. 5, 2012; July 6, 2013; and Nov. 11, 2013 for details.

Safety-II

Safety-II is proposed as a different way to look at safety performance.  It is proactive, i.e., it looks at the ways work is actually performed on a day-to-day basis and tries to identify causes of performance variability and then manage them.  A key cause of variability is the regular adjustments people make in performing their jobs in order to keep the system running.  In Hollnagel’s view, “Finding out what these [performance] adjustments are and trying to learn from them can be more important than finding the causes of infrequent adverse outcomes!” (p. 149)  The focus of safety management is on increasing the likelihood that things will go right and developing “the ability to succeed under varying conditions, . . .” (p. 137).

Performance is variable because, among other reasons, people are always making trade-offs between thoroughness and efficiency.  They may use heuristics or have to compensate for something that is missing or take some steps today to avoid future problems.  The underlying assumption of Safety-II is that the same behaviors that almost always lead to successful outcomes can occasionally lead to problems because of performance variability that goes beyond the boundary of the control space.  A second assumption is that chains of causes and effects may be non-linear, i.e., a small variance may lead to a large problem, and may have an emergent aspect where a specific performance variability may occur then disappear or the Swiss cheese holes may momentarily line up exposing the system to latent hazards. (pp. 66, 131-32)  There may be multiple explanations (“second stories”) for why a particular problem occurred.  Finally, Safety-II accepts that there are often differences between Work-as-Imagined (esp. as imagined by folks at the blunt end) and Work-as-Done (by people at the sharp end). (pp. 40-41)***

The Two Approaches

Safety-I and Safety-II are not in some winner-take-all competitive struggle.  Hollnagel notes there are plenty of problems for which a Safety-I investigation is appropriate and adequate. (pp. 141, 146)

Safety-I expenditures are viewed as a cost (to reduce errors). (p. 57)  In contrast, Safety-II expenditures are viewed as bona fide investments to create more correct outcomes. (p. 166)

In all cases, organizational factors, such as safety culture, can impact safety performance and organizational learning. (p. 31)

Our Perspective

The more complex a socio-technical entity is, the more it exhibits emergent properties and the more appropriate Safety-II thinking is.  And nuclear has some elements of complexity.****  In addition, Hollnagel notes that a common explanation for failures that occur in a System-I world is “it was never imagined something like that could happen.” (p. 172)  To avoid being the one in front of the cameras saying that, it might be helpful for you to spend a little time reflecting on how System-II thinking might apply in your world.

Why do most things go right?  Is it due to strict compliance with procedures?  Does personal creativity or insight contribute to successful plant performance?  Do you talk with your colleagues about possible efficiency-thoroughness trade-offs (short cuts) that you or others make?  Can thinking about why things go right make one more alert to situations where things are heading south?  Does more automation (intended to reduce reliance on fallible humans) actually move performance closer to the control boundary because it removes the human’s ability to make useful adjustments?  Has any of your root cause evaluations appeared to miss other plausible explanations for why a problem occurred?

Some of the Safety-II material is not new.  Performance variability in Safety-II builds on Hollnagel’s earlier work on the efficiency-thoroughness trade-off (ETTO) principle.  (See our Jan. 3, 2013 post.)   His call for mindfulness and constant alertness to problems is straight out of the High Reliability Organization playbook. (pp. 36, 163-64)  (See our May 3, 2013 post.)

A definite shortcoming is the lack of concrete examples in the Safety-II discussion.  If someone has tried to do this, it would be nice to hear about it.

Bottom line, Hollnagel has some interesting observations although his Safety-II model is probably not the Next Big Thing for nuclear safety management.

 

*  E. Hollnagel, Safety-I and Safety-II: The Past and Future of Safety Management  (Burlington, VT: Ashgate , 2014)

**  In the author’s view, forward-looking risk analysis is not proactive because it is infrequently performed. (p. 57) 

***  There are other assumptions in the Safety-I approach (see pp. 97-104) but for the sake of efficiency, they are omitted from this post.

****  Nuclear power plants have some aspects of a complex socio-technical system but other aspects are merely complicated.   On the operations side, activities are tightly coupled (one attribute of complexity) but most of the internal organizational workings are complicated.  The lack of sudden environmental disrupters (excepting natural disasters) means they have time to adapt to changes in their financial or regulatory environment, reducing complexity.

Monday, November 3, 2014

A Life In Error by James Reason



Most of us associate psychologist James Reason with the “Swiss Cheese Model” of defense in depth or possibly the notion of a “just culture.”  But his career has been more than two ideas, he has literally spent his professional life studying errors, their causes and contexts.  A Life In Error* is an academic memoir, recounting his study of errors starting with the individual and ending up with the organization (the “system”) including its safety culture (SC).  This post summarizes relevant portions of the book and provides our perspective.  It is going to read like a sub-titled movie on fast-forward but there are a lot of particulars packed in this short (124 pgs.) book. 

Slips and Mistakes 

People make plans and take action, consequences follow.  Errors occur when the intended goals are not achieved.  The plan may be adequate but the execution faulty because of slips (absent-mindedness) or trips (clumsy actions).  A plan that was inadequate to begin with is a mistake which is usually more subtle than a slip, and may go undetected for long periods of time if no obviously bad consequences occur. (pp. 10-12)  A mistake is a creation of higher-level mental activity than a slip.  Both slips and mistakes can take “strong but wrong” forms, where schema** that were effective in prior situations are selected even though they are not appropriate in the current situation.

Absent-minded slips can occur from misapplied competence where a planned routine is sidetracked into an unplanned one.  Such diversions can occur, for instance, when one’s attention is unexpectedly diverted just as one reaches a decision point and multiple schema are both available and actively vying to be chosen. (pp. 21-25)  Reason’s recipe for absent-minded errors is one part cognitive under-specification, e.g., insufficient knowledge, and one part the existence of an inappropriate response primed by prior, recent use and the situational conditions. (p. 49) 

Planning Biases 

The planning activity is subject to multiple biases.  An individual planner’s database may be incomplete or shaped by past experiences rather than future uncertainties, with greater emphasis on past successes than failures.  Planners can underestimate the influence of chance, overweight data that is emotionally charged, be overly influenced by their theories, misinterpret sample data or miss covariations, suffer hindsight bias or be overconfident.***  Once a plan is prepared, planners may focus only on confirmatory data and are usually resistant to changing the plan.  Planning in a group is subject to “groupthink” problems including overconfidence, rationalization, self-censorship and an illusion of unanimity.  (pp. 56-62)

Errors and Violations 

Violations are deliberate acts to break rules or procedures, although bad outcomes are not generally intended.  Violations arise from various motivational factors including the organizational culture.  Types of violations include corner-cutting to avoid clumsy procedures, necessary violations to get the job done because the procedures are unworkable, adjustments to satisfy conflicting goals and one-off actions (such as turning off a safety system) when faced with exceptional circumstances.  Violators perform a type of cost:benefit analysis biased by the fact that benefits are likely immediate while costs, if they occur, are usually somewhere in the future.  In Reason’s view, the proper course for the organization is to increase the perceived benefits of compliance not increase the costs (penalties) for violations.  (There is a hint of the “just culture” here.) 

Organizational Accidents 

Major accidents (TMI, Chernobyl, Challenger) have three common characteristics: contributing factors that were latent in the system, multiple levels of defense, and an unforeseen combination of latent factors and active failures (errors and/or violations) that defeated the defenses.  This is the well-known Swiss Cheese Model with the active failures opening short-lived holes and latent failures creating longer-lasting but unperceived holes.

Organizational accidents are low frequency, high severity events with causes that may date back years.  In contrast, individual accidents are more frequent but have limited consequences; they arise from slips, trips and lapses.  This is why organizations can have a good industrial accident record while they are on the road to a large-scale disaster, e.g., BP at Texas City. 

Organizational Culture 

Certain industries, including nuclear power, have defenses-in-depth distributed throughout the system but are vulnerable to something that is equally widespread.  According to Reason, “The most likely candidate is safety culture.  It can affect all elements in a system for good or ill.” (p. 81)  An inadequate SC can undermine the Swiss Cheese Model: there will be more active failures at the “sharp end”; more latent conditions created and sustained by management actions and policies, e.g., poor maintenance, inadequate equipment or downgrading training; and the organization will be reluctant to deal proactively with known problems. (pp. 82-83)

Reason describes a “cluster of organizational pathologies” that make an adverse system event more likely: blaming sharp-end operators, denying the existence of systemic inadequacies, and a narrow pursuit of production and financial goals.  He goes on to list some of the drivers of blame and denial.  The list includes: accepting human error as the root cause of an event; the hindsight bias; evaluating prior decisions based on their outcomes; shooting the messenger; belief in a bad apple but not a bad barrel (the system); failure to learn; a climate of silence; workarounds that compensate for systemic inadequacies’ and normalization of deviance.  (pp. 86-92)  Whew! 

Our Perspective 

Reason teaches us that the essence of understanding errors is nuance.  At one end of the spectrum, some errors are totally under the purview of the individual, at the other end they reside in the realm of the system.  The biases and issues described by Reason are familiar to Safetymatters readers and echo in the work of Dekker, Hollnagel, Kahneman and others.  We have been pounding the drum for a long time on the inadequacies of safety analyses that ignore systemic issues and corrective actions that are limited to individuals (e.g., more training and oversight, improved procedures and clearer expectations).

The book is densely packed with the work of a career.  One could easily use the contents to develop a SC assessment or self-assessment.  We did not report on the chapters covering research into absent-mindedness, Freud and medical errors (Reason’s current interest) but they are certainly worth reading.

Reason says this book is his valedictory: “I have nothing new to say and I’m well past my prime.” (p. 122)  We hope not.


*  J. Reason, A Life In Error: From Little Slips to Big Disasters (Burlington, VT: Ashgate, 2013).

**  Knowledge structures in long-term memory. (p. 24)

***  This will ring familiar to readers of Daniel Kahneman.  See our Dec. 18, 2013 post on Kahneman’s Thinking, Fast and Slow.

Monday, October 13, 2014

Systems Thinking in Air Traffic Management


A recent white paper* presents ten principles to consider when thinking about a complex socio-technical system, specifically European Air Traffic Management (ATM).  We review the principles below, highlighting aspects that might provide some insights for nuclear power plant operations and safety culture (SC).

Before we start, we should note that ATM is truly a complex** system.  Decisions involving safety and efficiency occur on a continuous basis.  There is always some difference between work-as-imagined and work-as-done.

In contrast, we have argued that a nuclear plant is a complicated system but it has some elements of complexity.  To the extent complexity exists, treating nuclear like a complicated machine via “analysing components using reductionist methods; identifying ‘root causes’ of problems or events; thinking in a linear and short-term way; . . . [or] making changes at the component level” is inadequate. (p. 5)  In other words, systemic factors may contribute to observed performance variability and frustrate efforts to achieve the goal in nuclear of eliminating all differences between work-as-planned and work-as-done.

Principles 1-3 relate to the view of people within systems – our view from the outside and their view from the inside.

1. Field Expert Involvement
“To understand work-as-done and improve how things really work, involve those who do the work.” (p. 8)
2. Local Rationality
“People do things that make sense to them given their goals, understanding of the situation and focus of attention at that time.” (p. 10)
3. Just Culture
“Adopt a mindset of openness, trust and fairness. Understand actions in context, and adopt systems language that is non-judgmental and non-blaming.” (p. 12)

Nuclear is pretty good at getting line personnel involved.  Adages such as “Operations owns the plant” are useful to the extent they are true.  Cross-functional teams can include operators or maintenance personnel.  An effective CAP that allows workers to identify and report problems with equipment, procedures, etc. is good; an evaluation and resolution process that involves members from the same class of workers is even better.  Having someone involved in an incident or near-miss go around to the tailgates and classes to share the lessons learned can be convincing.

But when something unexpected or bad happens, nuclear tends to spend too much time looking for the malfunctioning component (usually human).   “The assumption is that if the person would try harder, pay closer attention, do exactly what was prescribed, then things would go well. . . . [But a] focus on components becomes less effective with increasing system complexity and interactivity.” (p. 4)  An outside-in approach ignores the context in which the human performed, the information and time available, the competition for focus of attention, the physical conditions of the work, fatigue, etc.  Instead of insight into system nuances, the result is often limited to more training, supervision or discipline.

The notion of a “just culture” comes from James Reason.  It’s a culture where employees are not punished for their actions, omissions or decisions that are commensurate with their experience and training, but where gross negligence, willful violations and destructive acts are not tolerated.

Principles 4 and 5 relate to the system conditions and context that affect work.

4. Demand and Pressure
“Demands and pressures relating to efficiency and capacity have a fundamental effect on performance.” (p. 14)
5. Resources & Constraints

“Success depends on adequate resources and appropriate constraints.” (p. 16)

Fluctuating demand creates far more varied and unpredictable problems for ATM than it does in nuclear.  However, in nuclear the potential for goal conflicts between production, cost and safety is always present.  The problem arises from acting as if these conflicts don’t exist.

ATM has to “cope with variable demand and variable resources,” a situation that is also different from nuclear with its base load plants and established resource budgets.  The authors opine that for ATM, “a rigid regulatory environment destroys the capacity to adapt constantly to the environment.” (p. 2) Most of us think of nuclear as quite constrained by procedures, rules, policies, regulations, etc., but an important lesson from Fukushima was that under unforeseen conditions, the organization must be able to adapt according to local, knowledge-based decisions  Even the NRC recognizes that “flexibility may be necessary when responding to off-normal conditions.”***

Principles 6 through 10 concern the nature of system behavior, with 9 and 10 more concerned with system outcomes.  These do not have specific implications for SC other than keeping an open mind and being alert to systemic issues, e.g., complacency, drift or emergent behavior.

6. Interactions and Flows
“Understand system performance in the context of the flows of activities and functions, as well as the interactions that comprise these flows.” (p. 18)
7. Trade-Offs
“People have to apply trade-offs in order to resolve goal conflicts and to cope with the complexity of the system and the uncertainty of the environment.” (p. 20)
8. Performance variability
“Understand the variability of system conditions and behaviour.  Identify wanted and unwanted variability in light of the system’s need and tolerance for variability.” (p. 22)
9. Emergence
“System behaviour in complex systems is often emergent; it cannot be reduced to the behaviour of components and is often not as expected.” (p. 24)
10. Equivalence
“Success and failure come from the same source – ordinary work.” (p. 26)

Work flow certainly varies in ATM but is relatively well-understood in nuclear.  There’s really not much more to say on that topic.

Trade-offs occur in decision making in any context where more than one goal exists.  One useful mental model for conceptualizing trade-offs is Hollnagel’s efficiency-thoroughness construct, basically doing things quickly (to meet the production and cost goals) vs. doing things well (to meet the quality and possibly safety goals).  We reviewed his work on Jan. 3, 2013.

Performance variability occurs in all systems, including nuclear, but the outcomes are usually successful because a system has a certain range of tolerance and a certain capacity for resilience.  Performance drift happens slowly, and can be difficult to identify from the inside.  Dekker’s work speaks to this and we reviewed it on Dec. 5, 2012.

Nuclear is not fully complex but surprises do happen, some of them not caused by component failure.  Emergence (problems that arise from new or unforeseen system interactions) is more likely to occur following the implementation of new technical systems.  We discussed this possibility in a July 6, 2013 post on a book by Woods, Dekker et al.

Equivalence means that work that results in both good and bad outcomes starts out the same way, with people (saboteurs excepted) trying to be successful.  When bad things happen, we should cast a wide net in looking for different factors, including systemic ones, that aligned (like Swiss cheese slices) in the subject case.

The white paper also includes several real and hypothetical case studies illustrating the application of the principles to understanding safety performance challenges 

Our Perspective 

The authors draw on a familiar cast of characters, including Dekker, Hollnagel, Leveson and Reason.  We have posted about all these folks, just click on their label in the right hand column.

The principles are intended to help us form a more insightful mental model of a system under consideration, one that includes non-linear cause and effect relationships, and the possibility of emergent behavior.  The white paper is not a “must read” but may stimulate useful thinking about the nature of the nuclear operating organization.


*  European Organisation for the Safety of Air Navigation(EUROCONTROL), “Systems Thinking for Safety: Ten Principles” (Aug. 2014).  Thanks to Bill Mullins for bringing this white paper to our attention.

**  “[C]omplex systems involve large numbers of interacting elements and are typically highly dynamic and constantly changing with changes in conditions. Their cause-effect relations are non-linear; small changes can produce disproportionately large effects. Effects usually have multiple causes, though causes may not be traceable and are socially constructed.” (pp. 4-5)

Also see our Oct. 14, 2013 discussion of the California Independent System Operator for another example of a complex system.

***  “Work Processes,” NRC Safety Culture Trait Talk, no. 2 (July 2014), p. 1.  ADAMS ML14203A391.  Retrieved Oct. 8, 2014