Monday, April 13, 2015

Safety-I and Safety-II: The Past and Future of Safety Management by Erik Hollnagel

This book* discusses two different ways of conceptualizing safety performance problems (e.g., near-misses, incidents and accidents) and safety management in socio-technical systems.  This post describes each approach and provides our perspective on Hollnagel’s efforts.  As usual, our interest lies in the potential value new ways of thinking can offer to the nuclear industry.

Safety-I

This is the common way of looking at safety performance problems.  It is reactive, i.e., it waits for problems to arise** and analytic, e.g., it uses specific methods to work back from the problem to its root causes.  The key assumption is that something in the system has failed or malfunctioned and the purpose of an investigation is to identify the causes and correct them so the problem will not recur.  A second assumption is that chains of causes and effects are linear, i.e., it is actually feasible to start with a problem and work back to its causes.  A third assumption is that a single solution (the “first story”) can be found. (pp. 86, 175-76)***  Underlying biases include the hindsight bias (p. 176) and the belief that the human is usually the weak link. (pp. 78-79)  The focus of safety management is minimizing the number of things that go wrong.

Our treatment of Safety-I is brief because we have reported on criticism of linear thinking/models elsewhere, primarily in the work of Dekker, Woods et al, and Leveson.  See our posts of Dec. 5, 2012; July 6, 2013; and Nov. 11, 2013 for details.

Safety-II

Safety-II is proposed as a different way to look at safety performance.  It is proactive, i.e., it looks at the ways work is actually performed on a day-to-day basis and tries to identify causes of performance variability and then manage them.  A key cause of variability is the regular adjustments people make in performing their jobs in order to keep the system running.  In Hollnagel’s view, “Finding out what these [performance] adjustments are and trying to learn from them can be more important than finding the causes of infrequent adverse outcomes!” (p. 149)  The focus of safety management is on increasing the likelihood that things will go right and developing “the ability to succeed under varying conditions, . . .” (p. 137).

Performance is variable because, among other reasons, people are always making trade-offs between thoroughness and efficiency.  They may use heuristics or have to compensate for something that is missing or take some steps today to avoid future problems.  The underlying assumption of Safety-II is that the same behaviors that almost always lead to successful outcomes can occasionally lead to problems because of performance variability that goes beyond the boundary of the control space.  A second assumption is that chains of causes and effects may be non-linear, i.e., a small variance may lead to a large problem, and may have an emergent aspect where a specific performance variability may occur then disappear or the Swiss cheese holes may momentarily line up exposing the system to latent hazards. (pp. 66, 131-32)  There may be multiple explanations (“second stories”) for why a particular problem occurred.  Finally, Safety-II accepts that there are often differences between Work-as-Imagined (esp. as imagined by folks at the blunt end) and Work-as-Done (by people at the sharp end). (pp. 40-41)***

The Two Approaches

Safety-I and Safety-II are not in some winner-take-all competitive struggle.  Hollnagel notes there are plenty of problems for which a Safety-I investigation is appropriate and adequate. (pp. 141, 146)

Safety-I expenditures are viewed as a cost (to reduce errors). (p. 57)  In contrast, Safety-II expenditures are viewed as bona fide investments to create more correct outcomes. (p. 166)

In all cases, organizational factors, such as safety culture, can impact safety performance and organizational learning. (p. 31)

Our Perspective

The more complex a socio-technical entity is, the more it exhibits emergent properties and the more appropriate Safety-II thinking is.  And nuclear has some elements of complexity.****  In addition, Hollnagel notes that a common explanation for failures that occur in a System-I world is “it was never imagined something like that could happen.” (p. 172)  To avoid being the one in front of the cameras saying that, it might be helpful for you to spend a little time reflecting on how System-II thinking might apply in your world.

Why do most things go right?  Is it due to strict compliance with procedures?  Does personal creativity or insight contribute to successful plant performance?  Do you talk with your colleagues about possible efficiency-thoroughness trade-offs (short cuts) that you or others make?  Can thinking about why things go right make one more alert to situations where things are heading south?  Does more automation (intended to reduce reliance on fallible humans) actually move performance closer to the control boundary because it removes the human’s ability to make useful adjustments?  Has any of your root cause evaluations appeared to miss other plausible explanations for why a problem occurred?

Some of the Safety-II material is not new.  Performance variability in Safety-II builds on Hollnagel’s earlier work on the efficiency-thoroughness trade-off (ETTO) principle.  (See our Jan. 3, 2013 post.)   His call for mindfulness and constant alertness to problems is straight out of the High Reliability Organization playbook. (pp. 36, 163-64)  (See our May 3, 2013 post.)

A definite shortcoming is the lack of concrete examples in the Safety-II discussion.  If someone has tried to do this, it would be nice to hear about it.

Bottom line, Hollnagel has some interesting observations although his Safety-II model is probably not the Next Big Thing for nuclear safety management.

 

*  E. Hollnagel, Safety-I and Safety-II: The Past and Future of Safety Management  (Burlington, VT: Ashgate , 2014)

**  In the author’s view, forward-looking risk analysis is not proactive because it is infrequently performed. (p. 57) 

***  There are other assumptions in the Safety-I approach (see pp. 97-104) but for the sake of efficiency, they are omitted from this post.

****  Nuclear power plants have some aspects of a complex socio-technical system but other aspects are merely complicated.   On the operations side, activities are tightly coupled (one attribute of complexity) but most of the internal organizational workings are complicated.  The lack of sudden environmental disrupters (excepting natural disasters) means they have time to adapt to changes in their financial or regulatory environment, reducing complexity.

2 comments:

  1. Where are the advocates for Safety I?

    Who is willing to dispute the mischaracterization?

    Who is willing to show that the current approach, not "Safety I," has done a lot of good?

    Who is willing to show that resilience can be a primrose path to perdition?

    Isn't it too bad that most, if not all, major safety shortfalls involved either insufficient compliance, insufficient requirements, or both?

    I'm still looking for exceptions.

    All the best,

    Bill

    ReplyDelete
  2. For some time I've contended that it has been the most reasonable view of both the Global and US Nuclear Energy Enterprises that collectively they comprise one of the most Complex, High-Consequence Circumstances in the history of the industrial age.

    The variety of regulatory regimes, the Tower of Babel in the selective use of generally accepted notions such as culture and risk, along with the Myth of Nuclear's Spectacular Exception from every other form of High Reliability enterprise governance structure give some examples of the unacknowledged complexity that dominates the public fate of the nuclear power business. Success in nuclear operations cannot be demonstrated one plant or site at a time - we've been told that repeatedly since TMI - most practitioners refuse to accept the message.

    The reliance for purposes of "risk" characterization on an utterly non-scientific measure of infinite potential hazard (the LNTH coupled with the ALARA Principle) provides an example of the "current approach" which my friend Dr. Bill is correct to note is somewhat different from Safety 1 - it is in fact of the IAEA/NRC framework hyper-linear to the point of rigidity, brittleness, and demands for orthodoxy which create massive double bind situations on a national scale. Binds of unrealism the when they collapse are sufficient that the entire Japanese nuclear energy industry came to be shutdown as a result of the inevitable breech of faith between the promises of "excellence" (e.g. in sufficiency of requirements and compliance) at one site.

    The lack of regularly maintained distinction for governance purposes between aleatory variability (i.e. subject to PRA reliability assessment) and epistemic disrupters (e.g. price deregulation, ownership consolidation, life extension) is more than sufficient to suggest that those seemingly complicated internal organizational workings require a substantial degree of myopia if the proximity of complexifying factors is to be ignored.

    There is ample evidence in the history of major accident events that slowly evolving weakening of integrated safety management effectiveness is implicated in the onset of major accidents as frequently as the big tsunami type event. Yes firms often have time to adapt mindfully, but without investment in Resilient states of continual vigilance they too often don't put the time to good use. This is the message of INSAG-4 with its emphasis on the capacity to manage enterprise significant issues as they present, not as management hopes they will turn out to be.

    On net, the US NNEE is well along its Drift into Failure (at least in its present incarnation) - studious denial of the non-linear compounding of complexity is the largest factor in that demise. If one can't see that example, its hard to know what would be recognizable.

    ReplyDelete

Thanks for your comment. We read them all. We'd like to display them under their respective posts on our main page but that's not how Blogger works.