Thursday, January 17, 2013

Adm. Hyman Rickover – Systems Thinker

The TMI-2 accident occurred in 1979. In 1983 the plant owner, General Public Utilities Corp. (GPU), received a report* from Adm. Hyman Rickover (the “Father of the Nuclear Navy”) recommending that GPU be permitted by the NRC to restart the undamaged TMI Unit 1 reactor. We are not concerned with the report's details or conclusions but one part caught our attention.

The report begins by describing Rickover's seven principles for successful nuclear operation. One of these principles is the “Concept of Total Responsibility” which he explains as follows: “Operating nuclear plants safely requires adherence to a total concept wherein all elements are recognized as important and each is constantly reinforced. Training, equipment maintenance, technical support, radiological control, and quality control are essential elements but safety is achieved through integrating them effectively in operating decisions.” (p. 9, emphasis added)

We think the foregoing sounds like version 1.0 of points we have been emphasizing in this blog, namely:
  • Performance over time is the result of relationships and interactions among organizational components, in other words, the system is what's important.
  • Decisions are where the rubber meets the road in terms of goals, priorities and resource allocation; the extant safety culture provides a context for decision-making.
  • Safety performance is an emergent organizational property, a result of system activities, and cannot be predicted by examining individual system components.
We salute Adm. Rickover for his prescient insights.


* Adm. H.G. Rickover, “An Assessment of the GPU Nuclear Corporation Organization and Senior Management and Its Competence to Operate TMI-1” (Nov. 19, 1983). Available from Dickinson College library here.

Thursday, January 10, 2013

NRC Non-Regulation of Safety Culture: Fourth Quarter Update

NRC SC Brochure ML113490097
On March 17, July 3 and October 17, 2012 we posted on NRC safety culture (SC) related activities with individual licensees. This post highlights selected NRC actions during the fourth quarter, October through December 2012. We report on this topic to illustrate how the NRC squeezes plants on SC even if the agency is not officially regulating SC.

Prior posts mentioned Browns Ferry, Fort Calhoun and Palisades as plants where the NRC was undertaking significant SC-related activities. It appears none of those plants has resolved its SC issues.

Browns Ferry

An NRC supplemental inspection report* contained the following comment on a licensee root cause analysis: “Inadequate emphasis on the importance of regulatory compliance has contributed to a culture which lacks urgency in the identification and timely resolution of issues associated with non-compliant and potentially non-conforming conditions.” Later, the NRC observes “This culture change initiative [to address the regulatory compliance issue] was reviewed and found to still be in progress. It is a major corrective action associated with the upcoming 95003 inspection and will be evaluated during that inspection.” (Two other inspection reports, both issued November 30, 2012, noted the root cause analyses had appropriately considered SC contributors.)

An NRC-TVA public meeting was held December 5, 2012 to discuss the results of the supplemental inspections.** Browns Ferry management made a presentation to review progress in implementing their Integrated Improvement Plan and indicated they expected to be prepared for the IP 95003 inspection (which will include a review of the plant's third party SC assessment) in the spring of 2013.

Fort Calhoun

SC must be addressed to the NRC’s satisfaction prior to plant restart. The NRC's Oct. 2, 2012 inspection report*** provided details on the problems identified by the Omaha Public Power District (OPPD) in the independent Fort Calhoun SC assessment, including management practices that resulted “. . . in a culture that valued harmony and loyalties over standards, accountability, and performance.”

Fort Calhoun's revision 4 of its improvement plan**** (the first revision issued since Exelon took over management of the plant in September, 2012) reiterates management's previous commitments to establishing a strong SC and, in a closely related area, notes that “The Corrective Action Program is already in place as the primary tool for problem identification and resolution. However, CAP was not fully effective as implemented. A new CAP process has been implemented and root cause analysis on topics such as Condition Report quality continue to create improvement actions.”

OPPD's progress report***** at a Nov. 15, 2012 public meeting with the NRC includes over two dozen specific items related to improving or monitoring SC. However, the NRC restart checklist SC items remain open and the agency will be performing an IP 95003 inspection of Fort Calhoun SC during January-February, 2013.^

Palisades

Palisades is running but still under NRC scrutiny, especially for SC. The Nov. 9, 2012 supplemental inspection report^^ is rife with mentions of SC but eventually says “The inspection team concluded the safety culture was adequate and improving.” However, the plant will be subject to additional inspection efforts in 2013 to “. . . ensure that you [Palisades] are implementing appropriate corrective actions to improve the organization and strengthen the safety culture on site, as well as assessing the sustainability of these actions.”

At an NRC-Entergy public meeting December 11, Entergy's presentation focused on two plant problems (DC bus incident and service water pump failure) and included references to SC as part of the plant's performance recovery plan. The NRC presentation described Palisades SC as “adequate” and “improving.”^^^

Other Plants

NRC supplemental inspections can require licensees to assess “whether any safety culture component caused or significantly contributed to” some performance issue. NRC inspection reports note the extent and adequacy of the licensee’s assessment, often performed as part of a root cause analysis. Plants that had such requirements laid on them or had SC contributions noted in inspection reports during the fourth quarter included Braidwood, North Anna, Perry, Pilgrim, and St. Lucie. Inspection reports that concluded there were no SC contributors to root causes included Kewaunee and Millstone.

Monticello got a shout-out for having a strong SC. On the other hand, the NRC fired a shot across the bow of Prairie Island when the NRC PI&R inspection report included an observation that “. . . while the safety culture was currently adequate, absent sustained long term improvement, workers may eventually lose confidence in the CAP and stop raising issues.”^^^^ In other words, CAP problems are linked to SC problems, a relationship we've been discussing for years.

The NRC perspective and our reaction

Chairman Macfarlane's speech to INPO mentioned SC: “Last, I would like to raise “safety culture” as a cross-cutting regulatory issue. . . . Strengthening and sustaining safety culture remains a top priority at the NRC. . . . Assurance of an effective safety culture must underlie every operational and regulatory consideration at nuclear facilities in the U.S. and worldwide.”^^^^^

The NRC claims it doesn't regulate SC but isn't “assurance” part of “regulation”? If NRC practices and procedures require licensees to take actions they might not take on their own, don't the NRC's activities pass the duck test (looks like a duck, etc.) and qualify as de facto regulation? To repeat what we've said elsewhere, we don't care if SC is regulated but the agency should do it officially, through the front door, and not by sneaking in the back door.


*  E.F. Guthrie (NRC) to J.W. Shea (TVA), “Browns Ferry Nuclear Plant NRC Supplemental Inspection Report 05000259/2012014, 05000260/2012014, 05000296/2012014” (Nov. 23, 2012) ADAMS ML12331A180.

**  E.F. Guthrie (NRC) to J.W. Shea (TVA), “Public Meeting Summary for Browns Ferry Nuclear Plant, Docket No. 50-259, 260, and 296” (Dec. 18, 2012) ADAMS ML12353A314.

***  M. Hay (NRC) to L.P. Cortopassi (OPPD), “Fort Calhoun - NRC Integrated Inspection Report Number 05000285/2012004” (Oct. 2, 2012) ADAMS ML12276A456.

****  T.W. Simpkin (OPPD) to NRC, “Fort Calhoun Station Integrated Performance Improvement Plan, Rev. 4” (Nov. 1, 2012) ADAMS ML12311A164.

*****  NRC, “Summary of November 15, 2012, Meeting with Omaha Public Power District” (Dec. 3, 2012) ADAMS ML12338A191.

^  M. Hay (NRC) to L.P. Cortopassi (OPPD), “Fort Calhoun Station – Notification of Inspection (NRC Inspection Report 05000285/2013008 ” (Dec. 28, 2012) ADAMS ML12363A175.

^^  S. West (NRC) to A. Vitale (Entergy), “Palisades Nuclear Plant - NRC Supplemental Inspection Report 05000255/2012011; and Assessment Follow-up Letter” (Nov. 9, 2012) ADAMS ML12314A304.

^^^  O.W. Gustafson (Entergy) to NRC, Entergy slides to be presented at the December 11, 2012 public meeting (Dec. 7, 2012) ADAMS ML12342A350. NRC slides for the same meeting ADAMS ML12338A107.

^^^^  K. Riemer (NRC) to J.P. Sorensen (NSP), “Prairie Island Nuclear Generating Plant, Units 1 and 2; NRC Biennial Problem Identification and Resolution Inspection Report 05000282/2012007; 05000306/2012007” (Sept. 25, 2012) ADAMS ML12269A253.

^^^^^  A.M. Macfarlane, “Focusing On The NRC Mission: Maintaining Our Commitment to Safety” speech presented at the INPO CEO Conference (Nov. 6, 2012) ADAMS ML12311A496.

Thursday, January 3, 2013

The ETTO Principle: Efficiency-Thoroughness Trade-Off by Erik Hollnagel

This book* was suggested by a regular blog visitor. Below we provide a summary of the book followed by our assessment of how it comports with our understanding of decision making, system dynamics and safety culture.

Hollnagel describes a general principle, the efficiency-thoroughness trade-off (ETTO), that he believes almost all decision makers use. ETTO means that people and organizations routinely make choices between being efficient and being thorough. For example, if demand for production is high, thoroughness (time and other resources spent on planning and implementing an activity) is reduced until production goals are met. Alternatively, if demand for safety is high, efficiency (resources spent on production) is reduced until safety goals are met. (pp. 15, 28) Greater thoroughness is associated with increased safety.

ETTO is used for many reasons, including resource limitations, the need to maintain resource reserves, and social and organizational pressure. (p. 17) In practice, people use shortcuts, heuristics and rationalizations to make their decision-making more efficient. At the individual level, there are many ETTO rules, e.g., “It will be checked later by someone else,” “It has been checked earlier by someone else,” and “It looks like a Y, so it probably is a Y.” At the organizational level, ETTO rules include negative reporting (where the absence of reporting implies that everything is OK), cost reduction imperatives (which increase efficiency at the cost of thoroughness), and double-binds (where the explicit policy is “safety first” but the implicit policy is “production takes precedence when goal conflicts arise”). The use of any of these rules can lead to a compromise of safety. (pp. 35-36, 38-39) As decision makers ETTO, individual and organizational performance varies. Most of the time, things work out all right but sometimes failures occur. 

How do failures occur? 

Failures can happen when people, going about their work activities in a normal manner, create a series of ETTOs that ultimately result in unacceptable performance. These situations are more likely to occur the more complex and closely coupled the work system is. The best example (greatly simplified in the following) is an accident victim who arrived at an ER just before shift change on a Friday night. Doctor A examined her, ordered a head scan and X-rays and communicated with the surgery, ICU and radiology residents and her relief, Doctor B; Doctor B transferred the patient to the ICU, with care to be provided by the ICU and surgery residents; these residents and other doctors and staff provided care over the weekend. The major error was that everyone thought somebody else would read the patient's X-rays and make the correct diagnosis or, in the case of radiology doctors, did not carefully review the X-rays. On Monday, the rad tech who had taken the X-rays on Friday (and noticed an injury) asked the orthopedics resident about the patient; this resident had not heard of the case. Subsequent examination revealed that the patient had, along with her other injuries, a dislocated hip. (pp. 110-113) The book is populated with many other examples. 

Relation to other theorists 

Hollnagel refers to sociologist Charles Perrow, who believes some errors or accidents are unavoidable in complex, closely-coupled socio-technical organizations.** While Perrow used the term “interactiveness” (familiar vs unfamiliar) to grade complexity, Hollnagel updates it with “tractability” (knowable vs unknowable) to reflect his belief that in contemporary complex socio-technical systems, some of the relationships among internal variables and between variables and outputs are not simply “not yet specified” but “not specifiable.”

Both Hollnagel and Sydney Dekker identify with a type of organizational analysis called Resilience Engineering, which believes complex organizations must be designed to safely adapt to environmental pressure and recover from inevitable performance excursions outside the zone of tolerance. Both authors reject the linear, deconstructionist approach of fault-finding after incidents or accidents, the search for human error or the broken part. 

Assessment 

Hollnagel is a psychologist so he starts with the individual and then extends the ETTO principle to consider group or organizational behavior, finally extending it to the complex socio-technical system. He notes that such a system interacts with, attempts to control, and adapts to its environment, ETTOing all the while. System evolution is a strength but also makes the system more intractable, i.e., less knowable, and more likely to experience unpredictable performance variations. He builds on Perrow in this area but neither is a systems guy and, quite frankly, I'm not convinced either understands how complex systems actually work.

I feel ambivalence toward Hollnagel's thesis. Has he provided a new insight into decision making as practiced by real people, or has he merely updated terminology from earlier work (most notably, Herbert Simon's “satisficing”) that revealed that the “rational man” of classical economic theory really doesn't exist? At best, Hollnagel has given a name to a practice we've all seen and used and that is of some value in itself.

It's clear ETTO (or something else) can lead to failures in a professional bureaucracy, such as a hospital. ETTO is probably less obvious in a nuclear operating organization where “work to the procedure” is the rule and if a work procedure is wrong, then there's an administrative procedure to correct the work procedure. Work coordination and hand-offs between departments exhibit at least nominal thoroughness. But there is still plenty of room for decision-making short cuts, e.g., biases based on individual experience, group think and, yes, culture. Does a strong nuclear safety culture allow or tolerate ETTO? Of course. Otherwise, work, especially managerial or professional work, would not get done. But a strong safety culture paints brighter, tighter lines around performance expectations so decision makers are more likely to be aware when their expedient approaches may be using up safety margin.

Finally, Hollnagel's writing occasionally uses strained logic to “prove” specific points, the book needs a better copy editor, and my deepest suspicion is he is really a peripatetic academic trying to build a career on a relatively shallow intellectual construct.


* E. Hollnagel, The ETTO Principle: Efficiency-Thoroughness Trade-Off (Burlington, VT: Ashgate, 2009).

** C. Perrow, Normal Accidents: Living with High-Risk Technologies (New York: Basic Books, 1984).

Friday, December 28, 2012

Uh-oh, Delays at Vogtle

This Wall Street Journal article* reports that the new Vogtle units may be in construction schedule trouble. The article notes that the new, modular construction techniques being employed were expected to save time and dollars but may be having the opposite effect. In addition, and somewhat incredibly, the independent monitor is citing design changes as another cause of delays. Thought that lesson had been learned a hundred times in the nuclear industry.

Then there is the inevitable finger pointing:

“The delays and cost pressures have created friction between the construction partners and utility companies that will serve as the plant's owners, escalating into a series of lawsuits totaling more than $900 million.”

The Vogtle situation also serves as a reminder that nuclear safety culture (NSC) is applicable to the construction phase though to our recollection, there was not a lot of talk about it during the NRC’s policy statement development process. The escalating schedule and cost pressures at Vogtle also serve to remind us of how significant a factor such pressures can be in a “massive, complex, first-of-a-kind project” (to quote the Westinghouse spokesman). These situational conditions will be challenging construction workers and management who may not possess the same level of NSC experience or consciousness as nuclear operating organizations.


* R. Smith, “New Nuclear Plant HitsSome Snags,” Wall Street Journal online (Dec. 23, 2012).

Thursday, December 20, 2012

The Logic of Failure by Dietrich Dörner

This book was mentioned in a nuclear safety discussion forum so we figured this is a good time to revisit Dörner's 1989 tome.* Below we provide a summary of the book followed by our assessment of how it fits into our interest in decision making and the use of simulations in training.

Dörner's work focuses on why people fail to make good decisions when faced with problems and challenges. In particular, he is interested in the psychological needs and coping mechanisms people exhibit. His primary research method is observing test subjects interact with simulation models of physical sub-worlds, e.g., a malfunctioning refrigeration unit, an African tribe of subsistence farmers and herdsmen, or a small English manufacturing city. He applies his lessons learned to real situations, e.g, the Chernobyl nuclear plant accident.

He proposes a multi-step process for improving decision making in complicated situations then describes each step in detail and the problems people can create for themselves while executing the step. These problems generally consist of tactics people adopt to preserve their sense of competence and control at the expense of successfully achieving overall objectives. Although the steps are discussed in series, he recognizes that, at any point, one may have to loop back through a previous step.

Goal setting

Goals should be concrete and specific to guide future steps. The relationships between and among goals should be specified, including dependencies, conflicts and relative importance. When people don't to do this, they can become distracted by obvious or unimportant (although potentially achievable) goals, or peripheral issues they know how to address rather than important issues that should be resolved. Facing performance failure, they may attempt to turn failure into success with doublespeak or blame unseen forces.

Formulate models and gather information

Good decision-making requires an adequate mental model of the system being studied—the variables that comprise the system and the functional relationships among them, which may include positive and negative feedback loops. The model's level of detail should be sufficient to understand the interrelationships among the variables the decision maker wants to influence. Unsuccessful test subjects were inclined to use a “reductive hypothesis,” which unreasonably reduces the model to a single key variable, or overgeneralization.

Information gathered is almost always incomplete and the decision maker has to decide when he has enough to proceed. The more successful test subjects asked more questions and made fewer decisions (then the less successful subjects) in the early time periods of the sim.

Predict and extrapolate

Once a model is formulated, the decision maker must attempt to determine how the values of variables will change over time in response to his decisions or internal system dynamics. One problem is predicting that outputs will change in a linear fashion, even as the evidence grows for a non-linear, e.g., exponential function. An exponential variable may suddenly grow dramatically then equally suddenly reverse course when the limits on growth (resources) are reached. Internal time delays mean that the effects of a decision are not visible until some time in the future. Faced with poor results, unsuccessful test subjects implement or exhibit “massive countermeasures, ad hoc hypotheses that ignore the actual data, underestimations of growth processes, panic reactions, and ineffectual frenetic activity.” (p. 152) Successful subjects made an effort to understand the system's dynamics, kept notes (history) on system performance and tried to anticipate what would happen in the future.

Plan and execute actions, check results and adjust strategy

The essence of planning is to think through the consequences of certain actions and see whether those actions will bring us closer to our desired goal.” (p. 153) Easier said than done in an environment of too many alternative courses of action and too little time. In rapidly evolving situations, it may be best to create rough plans and delegate as many implementing decisions as possible to subordinates. A major risk is thinking that planning has been so complete than the unexpected cannot occur. A related risk is the reflexive use of historically successful strategies. “As at Chernobyl, certain actions carried out frequently in the past, yielding only the positive consequences of time and effort saved and incurring no negative consequences, acquire the status of an (automatically applied) ritual and can contribute to catastrophe.” (p. 172)

In the sims, unsuccessful test subjects often exhibited “ballistic” behavior—they implemented decisions but paid no attention to, i.e, did not learn from, the results. Successful subjects watched for the effects of their decisions, made adjustments and learned from their mistakes.

Dörner identified several characteristics of people who tended to end up in a failure situation. They failed to formulate their goals, didn't recognize goal conflict or set priorities, and didn't correct their errors. (p. 185) Their ignorance of interrelationships among system variables and the longer-term repercussions of current decisions set the stage for ultimate failure.

Assessment

Dörner's insights and models have informed our thinking about human decision-making behavior in demanding, complicated situations. His use and promotion of simulation models as learning tools was one starting point for Bob Cudlin's work in developing a nuclear management training simulation program. Like Dörner, we see simulation as a powerful tool to “observe and record the background of planning, decision making, and evaluation processes that are usually hidden.” (pp. 9-10)

However, this book does not cover the entire scope of our interests. Dörner is a psychologist interested in individuals, group behavior is beyond his range. He alludes to normalization of deviance but his references appear limited to the flaunting of safety rules rather than a more pervasive process of slippage. More importantly, he does not address behavior that arises from the system itself, in particular adaptive behavior as an open system reacts to and interacts with its environment.

From our view, Dörner's suggestions may help the individual decision maker avoid common pitfalls and achieve locally optimum answers. On the downside, following Dörner's prescription might lead the decision maker to an unjustified confidence in his overall system management abilities. In a truly complex system, no one knows how the entire assemblage works. It's sobering to note that even in Dörner's closed,** relatively simple models many test subjects still had a hard time developing a reasonable mental model, and some failed completely.

This book is easy to read and Dörner's insights into the psychological traps that limit human decision making effectiveness remain useful.


* D. Dörner, The Logic of Failure: Recognizing and Avoiding Error in Complex Situations, trans. R. and R. Kimber (Reading, MA: Perseus Books, 1998). Originally published in German in 1989.

** One simulation model had an external input.

Wednesday, December 12, 2012

“Overpursuit” of Goals

We return to a favorite subject, the impact of goals and incentives on safety culture and performance. Interestingly this subject comes up in an essay by Oliver Burkeman, “The Power of Negative Thinking,”* which may seem unusual as most people think of goals and achievement of goals as the product of a positive approach. Traditional business thinking is to set hard, quantitative goals, the bigger the better. But futures are inherently uncertain and goals generally are not so. The counter intuitive argument suggests the most effective way to address future performance is to focus on worst case outcomes. Burkeman observes that “...rigid goals may encourage employees to cut ethical corners” and “Focusing on one goal at the expense of all other factors also can distort a corporate mission or an individual life…” and result in “...the ‘overpursuit’ of goals…” Case in point, yellow jerseys.

This raises some interesting points for nuclear safety. First we would remind our readers of Snowden’s Cynefin decision context framework, specifically his “complex” space which is indicative of where nuclear safety decisions reside. In this environment there are many interacting causes and effects, making it difficult or impossible to pursue specific goals along defined paths. Clearly an uncertain landscape. As Simon French argues: “Decision support will be more focused on exploring judgement and issues, and on developing broad strategies that are flexible enough to accommodate changes as the situation evolves.”** This would suggest the pursuit of specific, aspirational goals may be misguided or counterproductive.

Second, safety performance goals are hard to identify anyway. Is it the absence of bad outcomes? Or the maintenance of, say, a “strong” safety culture - whatever that is. One indication of the elusiveness of safety goals is their absence as targets in incentive programs. So there is probably little likelihood of overemphasizing safety performance as a goal. But is the same true for operational type goals such as capacity factor, refuel outage durations, and production costs? Can an overly strong focus on such short term goals, often associated with stretching performance, lead to overpursuit? What if large financial incentives are attached to the achievement of the goals?

The answer is not: “Safety is our highest priority”. More likely it is an approach that considers the complexity and uncertainty of nuclear operating space and the potential for hard goals to cut both ways. It might value how a management team prosecutes its responsibilities more than the outcome itself.


* O. Burkeman, “The Power of Negative Thinking,” Wall Street Journal online (Dec. 7, 2012).

** S. French, “Cynefin: repeatability, science and values,” Newsletter of the European Working Group “Multiple Criteria Decision Aiding,” series 3, no. 17 (Spring 2008) p. 2. We posted on Cynefin and French's paper here.

Wednesday, December 5, 2012

Drift Into Failure by Sydney Dekker

Sydney Dekker's Drift Into Failure* is a noteworthy effort to provide new insights into how accidents and other bad outcomes occur in large organizations. He begins by describing two competing world views, the essentially mechanical view of the world spawned by Newton and Descartes (among others), and a view based on complexity in socio-technical organizations and a systems approach. He shows how each world view biases the search for the “truth” behind how accidents and incidents occur.

Newtonian-Cartesian (N-C) Vision

Issac Newton and Rene Descartes were leading thinkers during the dawn of the Age of Reason. Newton used the language of mathematics to describe the world while Descartes relied on the inner process of reason. Both believed there was a single reality that could be investigated, understood and explained through careful analysis and thought—complete knowledge was possible if investigators looked long and hard enough. The assumptions and rules that started with them, and were extended by others over time, have been passed on and most of us accept them, uncritically, as common sense, the most effective way to look at the world.

The N-C world is ruled by invariant cause-and-effect; it is, in fact, a machine. If something bad happens, then there was a unique cause or set of causes. Investigators search for these broken components, which could be physical or human. It is assumed that a clear line exists between the broken part(s) and the overall behavior of the system. The explicit assumption of determinism leads to an implicit assumption of time reversibility—because system performance can be predicted from time A if we know the starting conditions and the functional relationships of all components, then we can start from a later time B (the bad outcome) and work back to the true causes. (p. 84) Root cause analysis and criminal investigations are steeped in this world view.

In this view, decision makers are expected to be rational people who “make decisions by systematically and consciously weighing all possible outcomes along all relevant criteria.” (p. 3) Bad outcomes are caused by incompetent or worse, corrupt decision makers. Fixes include more communications, training, procedures, supervision, exhortations to try harder and criminal charges.

Dekker credits Newton et al for giving man the wherewithal to probe Nature's secrets and build amazing machines. However, Newtonian-Cartesian vision is not the only way to view the world, especially the world of complex, socio-technical systems. For that a new model, with different concepts and operating principles, is required.

The Complex System

Characteristics

The sheer number of parts does not make a system complex, only complicated. A truly complex system is open (it interacts with its environment), has components that act locally and don't know the full effects of their actions, is constantly making decisions to maintain performance and adapt to changing circumstances, and has non-linear interactions (small events can cause large results) because of multipliers and feedback loops. Complexity is a result of the ever-changing relationships between components. (pp.138-144)

Adding to the myriad information confronting a manager or observer, system performance is often optimized at the edge of chaos, where competitors are perpetually vying for relative advantage at an affordable cost.** The system is constantly balancing its efforts between exploration (which will definitely incur costs but may lead to new advantages) and exploitation (which reaps benefits of current advantages but will likely dissipate over time). (pp. 164-165)

The most important feature of a complex system is that it adapts to its environment over time in order to survive. And its environment is characterized by resource scarcity and competition. There is continuous pressure to maintain production and increase efficiency (and their visible artifacts: output, costs, profits, market share, etc) and less visible outputs, e.g., safety, will receive less attention. After all, “Though safety is a (stated) priority, operational systems do not exist to be safe. They exist to provide a service or product . . . .” (p. 99) And the cumulative effect of multiple adaptive decisions can be an erosion of safety margins and a changed response of the entire system. Such responses may be beneficial or harmful—a drift into failure.

Drift by a complex system exhibits several characteristics. First, as mentioned above, it is driven by environmental factors. Second, drift occurs in small steps so changes can be hardly noticed, and even applauded if they result in local performance improvement; “. . . successful outcomes keep giving the impression that risk is under control” (p. 106) as a series of small decisions whittle away at safety margins. Third, these complex systems contain unruly technology (think deepwater drilling) where uncertainties exist about how the technology may be ultimately deployed and how it may fail. Fourth, there is significant interaction with a key environmental player, the regulator, and regulatory capture can occur, resulting in toothless oversight.

“Drifting into failure is not so much about breakdowns or malfunctioning of components, as it is about an organization not adapting effectively to cope with the complexity of its own structure and environment.” (p. 121) Drift and occasionally accidents occur because of ordinary system functioning, normal people going about their regular activities making ordinary decisions “against a background of uncertain technology and imperfect information.” Accidents, like safety, can be viewed as an emergent system property, i.e., they are the result of system relationships but cannot be predicted by examining any particular system component.

Managers' roles

Managers should not try to transform complex organizations into merely complicated ones, even if it's possible. Complexity is necessary for long-term survival as it maximizes organizational adaptability. The question is how to manage in a complex system. One key is increasing the diversity of personnel in the organization. More diversity means less group think and more creativity and greater capacity for adaptation. In practice, this means validation of minority opinions and encouragement of dissent, reflecting on the small decisions as they are made, stopping to ponder why some technical feature or process is not working exactly as expected and creating slack to reduce the chances of small events snowballing into large failures. With proper guidance, organizations can drift their way to success.

Accountability

Amoral and criminal behavior certainly exist in large organizations but bad outcomes can also result from normal system functioning. That's why the search for culprits (bad actors or broken parts) may not always be appropriate or adequate. This is a point Dekker has explored before, in Just Culture (briefly reviewed here) where he suggests using accountability as a means to understand the system-based contributors to failure and resolve those contributors in a manner that will avoid recurrence.

Application to Nuclear Safety Culture

A commercial nuclear power plant or fleet is probably not a complete complex system. It interacts with environmental factors but in limited ways; it's certainly not directly exposed to the Wild West competition of say, the cell phone industry. Group think and normalization of deviance*** is a constant threat. The technology is reasonably well-understood but changes, e.g., uprates based on more software-intensive instrumentation and control, may be invisibly sanding away safety margin. Both the industry and the regulator would deny regulatory capture has occurred but an outside observer may think the relationship is a little too cozy. Overall, the fit is sufficiently good that students of safety culture should pay close attention to Dekker's observations.

In contrast, the Hanford Waste Treatment Plant (Vit Plant) is almost certainly a complex system and this book should be required reading for all managers in that program.

Conclusion

Drift Into Failure is not a quick read. Dekker spends a lot of time developing his theory, then circling back to further explain it or emphasize individual pieces. He reviews incidents (airplane crashes, a medical error resulting in patient death, software problems, public water supply contamination) and descriptions of organization evolution (NASA, international drug smuggling, “conflict minerals” in Africa, drilling for oil, terrorist tactics, Enron) to illustrate how his approach results in broader and arguably more meaningful insights than the reports of official investigations. Standing on the shoulders of others, especially Diane Vaughan, Dekker gives us a rich model for what might be called the “banality of normalization of deviance.” 


* S. Dekker, Drift Into Failure: From Hunting Broken Components to Understanding Complex Systems (Burlington VT: Ashgate 2011).

** See our Sept. 4, 2012 post onCynefin for another description of how the decisions an organization faces can suddenly slip from the Simple space to the Chaotic space.

*** We have posted many times about normalization of deviance, the corrosive organizational process by which the yesterday's “unacceptable” becomes today's “good enough.”