Saturday, July 6, 2013

Behind Human Error by Woods, Dekker, Cook, Johannesen and Sarter

This book* examines how errors occur in complex socio-technical systems.  The authors' thesis is that behind every ascribed “human error” there is a “second story” of the context (conditions, demands, constraints, etc.) created by the system itself.  “That which we label “human error” after the fact is never the cause of an accident.  Rather, it is the cumulative effect of multiple cognitive, collaborative, and organizational factors.” (p. 35)  In other words, “Error is a symptom indicating the need to investigate the larger operational systems and the organizational context in which it functions.” (p. 28)  This post presents a summary of the book followed by our perspective on its value.  (The book has a lot of content so this will not be a short post.)

The Second Story

This section establishes the authors' view of error and how socio-technical systems function.  They describe two mutually exclusive world views: (1) “erratic people degrade an otherwise safe system” vs. (2) “people create safety at all levels of the socio-technical system by learning and adapting . . .” (p. 6)  It should be obvious that the authors favor option 2.

In such a world “Failure, then, represents breakdowns in adaptations directed at coping with complexity.  Indeed, the enemy of safety is not the human: it is complexity.” (p. 1)  “. . . accidents emerge from the coupling and interdependence of modern systems.” (p. 31) 

Adaptation occurs in response to pressures or environmental changes.  For example, systems are under stakeholder pressure to become faster, better, cheaper; multiple goals and goal conflict are regular complex system characteristics.  But adaptation is not always successful.  There may be too little (rules and procedures are followed even though conditions have changed) or too much (adaptation is attempted with insufficient information to achieve goals).  Because of pressure, adaptations evolve toward performance boundaries, in particular, safety boundaries.  There is a drift toward failure. (see Dekker, reviewed here)

The authors present 15 premises for analyzing errors in complex socio-technical systems. (pp. 19-30)  Most are familiar but some are worth highlighting and remembering when thinking about system errors:

  • “There is a loose coupling between process and outcome.”  A “bad” process does not always produce bad outcomes and a “good” process does not always produce good outcomes.
  • “Knowledge of outcome (hindsight) biases judgments about process.”  More about that later.
  • “Lawful factors govern the types of erroneous actions or assessments to be expected.”   In other words, “errors are regular and predictable consequences of a variety of factors.”
  • “The design of artifacts affects the potential for erroneous actions and paths towards disaster.”  This is Human Factors 101 but problems still arise.  “Increased coupling increases the cognitive demands on practitioners.”  Increased coupling plus weak feedback can create a latent failure.

Complex Systems Failure


This section covers traditional mental models used for assessing failures and points out the putative inadequacies of each.  The sequence-of-events (or domino) model is familiar Newtonian causal analysis.  Man-made disaster theory puts company culture and institutional design at the heart of the safety question.  Vulnerability develops over time but is hidden by the organization’s belief that it has risk under control.  A system or component is driven into failure.  The latent failure (or Swiss cheese) model proposes that “disasters are characterized by a concatenation of several small failures and contributing events. . .” (p. 50)  While a practitioner may be closest to an accident, the associated latent failures were created by system managers, designers, maintainers or regulators.  All these models reinforce the search for human error (someone untrained, inattentive or a “bad apple) and the customary fixes (more training, procedure adherence and personal attention, or targeted discipline).  They represent a failure to adopt systems thinking and concepts of dynamics, learning, adaptation and the notion that a system can produce accidents as a natural consequence of its normal functioning.

A more sophisticated set of models is then discussed.  Perrow's normal accident theory says that “accidents are the structural and virtually inevitable product of systems that are both interactively complex and tightly coupled.” (p. 61)  Such systems structurally confuse operators and prevent them from recovering when incipient failure is discovered.  People are part of the Perrowian system and can exhibit inadequate expertise.  Control theory sees systems as composed of components that must be kept in dynamic equilibrium based on feedback and continual control inputs—basically a system dynamics view.  Accidents are a result of normal system behavior and occur when components interact to violate safety constraints and the feedback (and control inputs) do not reflect the developing problems.  Small changes in the system can lead to huge consequences elsewhere.  Accident avoidance is based on making system performance boundaries explicit and known although the goal of efficiency will tend to push operations toward the boundaries.  In contrast, the authors would argue for a different focus: making the system more resilient, i.e., error-tolerant.**  High reliability theory describes how how-hazard activities can achieve safe performance through leadership, closed systems, functional decentralization, safety culture, redundancy and systematic learning.  High reliability means minimal variations in performance, which in the short-term, means safe performance but HROs are subject to incidents indicative of residual system noise and unseen changes from social forces, information management or new technologies. (See Weick, reviewed here)

Standing on the shoulders of the above sophisticated models, resilience engineering (RE) is proposed as a better way to think about safety.  According to this model, accidents “represent the breakdowns in the adaptations necessary to cope with the real world complexity. (p. 83)  The authors use the Columbia space shuttle disaster to illustrate patterns of failure evident in complex systems: drift toward failure, past success as reason for continued confidence, fragmented problem-solving, ignoring new evidence and intra-organizational communication breakdowns.  To oppose or compensate for these patterns, RE proposes monitoring or enhancing other system properties including: buffering capacity, flexibility, margin and tolerance (which means replacing quick collapse with graceful degradation).  RE “focuses on what sustains or erodes the adaptive capacities of human-technical systems in a changing environment.” (p. 93)  In practice, that means detecting signs of increasing risk, having resources for safety available, and recognizing when and where to invest to offset risk.  It also requires focusing on organizational decision making, e.g., cross checks for risky decisions, the safety-production-efficiency balance and the reporting and disposition of safety concerns.  “Enhancing error tolerance, detection and recovery together produce safety.” (p. 26)

Operating at the Sharp End

An organization's sharp end is where practitioners apply their expertise in an effort to achieve the organization's goals.  The blunt end is where support functions, from administration to engineering, work.  The blunt end designs the system, the sharp end operates it.  Practitioner performance is affected by cognitive activities in three areas: activation of knowledge, the flow of attention and interactions among multiple goals.

The knowledge available to practitioners arrives as organized content.  Challenges include: organization may be poor, the content may be incomplete or simply wrong.  Practitioner mental models may be inaccurate or incomplete without the practitioners realizing it, i.e., they may be poorly calibrated.  Knowledge may be inert, i.e., not accessed when it is needed.  Oversimplifications (heuristics) may work in some situations but produce errors in others and limit the practitioner's ability to account for uncertainties or conflicts that arise in individual cases.  The discussion of heuristics suggests Hollnagel, reviewed here.

Mindset is about attention and its control.” (p. 114)  Attention is a limited resource.  Problems with maintaining effective attention include loss of situational awareness, in which the practitioner's mental model of events doesn't match the real world, and fixation, where the practitioner's initial assessment of  a situation creates a going-forward bias against accepting discrepant data and a failure to trigger relevant inert knowledge.  Mindset seems similar to HRO mindfulness. (see Weick)

Goal conflict can arise from many sources including management policies, regulatory requirements, economic (cost) factors and risk of legal liability.  Decision making must consider goals (which may be implicit), values, costs and risks—which may be uncertain.  Normalization of deviance is a constant threat.  Decision makers may be held responsible for achieving a goal but lack the authority to do so.  The conflict between cost and safety may be subtle or unrecognized.  “Safety is not a concrete entity and the argument that one should always choose the safest path misrepresents the dilemmas that confront the practitioner.” (p. 139)  “[I]t is difficult for many organizations (particularly in regulated industries) to admit that goal conflicts and tradeoff decisions arise.” (p. 139)  Overall, the authors present a good discussion of goal conflict.

How Design Can Induce Error


The design of computerized devices intended to help practitioners can instead lead to greater risks of errors and incidents.  Specific causes of problems include clumsy automation, limited information visibility and mode errors. 

Automation is supposed to increase user effectiveness and efficiency.  However, clumsy automation creates situations where the user loses track of what the computer is set up to do, what it's doing and what it will do next.  If support systems are so flexible that users can't know all their possible configurations, they adopt simplifying strategies which may be inappropriate in some cases.  Clumsy automation leads to more (instead of less) cognitive work, user attention is diverted to the machine instead of the task, increased potential for new kinds of errors and the need for new user knowledge and judgments.  The machine effectively has its own model of the world, based on user inputs, data sensors and internal functioning, and passes that back to the user.

Machines often hide a mass of data behind a narrow keyhole of visibility into the system.  Successful design creates “a visible conceptual space meaningfully related to activities and constraints in a field of practice.” (p. 162)  In addition, “Effective representations highlight  'operationally interesting' changes for sequences of behavior . . .” (p. 167)  However, default displays typically do not make interesting events directly visible.

Mode errors occurs when an operator initiates an action that would be appropriate if the machine were in mode A but, in fact, it's in mode B.  (This may be a man-machine problem but it's not the machine's fault.)  A machine can change modes based on situational and system factors in addition to operator input.  Operators have to maintain mode awareness, not an easy task when viewing a small, cluttered display that may not highlight current mode or mode changes.

To cope with bad design “practitioners adapt information technology provided for them to the immediate tasks at hand in a locally pragmatic way, . . .” (p. 191)  They use system tailoring where they adapt the device, often by focusing on a feature set they consider useful and ignoring other machine capabilities.  They use task tailoring where they adapt strategies to accommodate constraints imposed by the new technology.  Both types of adaptation can lead to success or eventual failures. 

The authors suggest various countermeasures and design changes to address these problems. 

Reactions to Failure

Different approaches for analyzing accidents lead to different perspectives on human error. 

Hindsight bias is “the tendency for people to 'consistently exaggerate what could have been anticipated in foresight.'” (p. 15)  It reinforces the tendency to look for the human in the human error.  Operators are blamed for bad outcomes because they are available, tracking back to multiple contributing causes is difficult, most system performance is good and investigators tend to judge process quality by its outcome.  Outsiders tend to think operators knew more about their situation than they actually did.  Evaluating process instead of outcome is also problematic.  Process and outcome are loosely coupled and what standards should be used for process evaluation?  Formal work descriptions “underestimate the dilemmas, interactions between constraints, goal conflicts, and tradeoffs present in the actual workplace.” (p. 208)  A suggested alternative approach is to ask what other practitioners would have done in the same situation and build a set of contrast cases.  “What we should not do, . . . is rely on putatively objective external evaluations . . . such as . . . court cases or other formal hearings.  Such processes in fact institutionalize and legitimate the hindsight bias . . . leading to blame and a focus on individual actors at the expense of a system view.” (pp. 213-214)

Distancing through differencing is another risk.  In this practice, reviewers focus on differences between the context surrounding an accident and their own circumstance.  Blaming individuals reinforces belief that there are no lessons to be learned for other organizations.  If human error is local and individual (as opposed to systemic) then sanctions, exhortations to follow the procedures and remedial training are sufficient fixes.  There is a decent discussion of TMI here, where, in the authors' opinion, the initial sense of fundamental surprise and need for socio-technical fixes was soon replaced by a search for local, technologically-focused solutions.
      
There is often pressure to hold people accountable after incidents or accidents.  One answer is a “just culture” which views incidents as system learning opportunities but also draws a line between acceptable and unacceptable behavior.  Since the “line” is an attribution the key question for any organization is who gets to draw it.  Another challenge is defining the discretionary space where individuals alone have the authority to decide how to proceed.  There is more on just culture but this is all (or mostly) Dekker. (see our Just Culture commentary here)

The authors' recommendations for analyzing errors and improving safety can be summed up as follows: recognize that human error is an attribution; pursue second stories that reveal the multiple, systemic contributors to failure; avoid hindsight bias; understand how work really gets done; search for systemic vulnerabilities; study how practice creates safety; search for underlying patterns; examine how change will produce new vulnerabilities; use technology to enhance human expertise; and tame complexity. (p. 239)  “Safety is created at the sharp end as practitioners interact with hazardous processes . . . using the available tools and resources.” (p. 243)

Our Perspective

This is a book about organizational characteristics and socio-technical systems.  Recommendations and advice are aimed at organizational policy makers and incident investigators.  The discussion of a “just culture” is the only time culture is discussed in detail although safety culture is mentioned in passing in the HRO write-up.

Our first problem with the book is repeatedly referring to medicine, aviation, aircraft carrier operations and nuclear power plants as complex systems.***  Although medicine is definitely complex and aviation (including air traffic control) possibly is, carrier operations and nuclear power plants are simply complicated.  While carrier and nuclear personnel have to make some adaptations on the fly, they do not face sudden, disruptive changes in their technologies or operating environments and they are not exposed to cutthroat competition.  Their operations are tightly coordinated but, where possible, by design more loosely coupled to facilitate recovery if operations start to go sour.  In addition, calling nuclear power operations complex perpetuates the myth that nuclear is “unique and special” and thus merits some special place in the pantheon of industry.  It isn't and it doesn't.

Our second problem relates to the authors' recasting of the nature of human error.  We decry the rush to judgment after negative events, particularly a search limited to identifying culpable humans.  The search for bad apples or outright criminals satisfies society's perceived need to bring someone to justice and the corporate system's desire to appear to fix things through management exhortations and training without really admitting systemic problems or changing anything substantive, e.g., the management incentive plan.  The authors' plea for more systemic analysis is thus welcome.

But they push the pendulum too far in the opposite direction.  They appear to advocate replacing all human errors (except for gross negligence, willful violations or sabotage) with systemic explanations, aka rationalizations.  What is never mentioned is that medical errors lead to tens of thousands of preventable deaths per year.****  In contrast, U.S. commercial aviation has not experienced over a hundred fatalities (excluding 9/11) since 1996; carriers and nuclear power plants experience accidents, but there are few fatalities.  At worst, this book is a denial that real human errors (including bad decisions, slip ups, impairments, coverups) occur and a rationalization of medical mistakes caused by arrogance, incompetence, class structure and lack of accountability.

This is a dense book, 250 pages of small print, with an index that is nearly useless.  Pressures (most likely cost and schedule) have apparently pushed publishing to the system boundary for copy editing—there are extra, missing and wrong words throughout the text.

This 2010 second edition updates the original 1994 monograph.  Many of the original ideas have been fleshed out elsewhere by the authors (primarily Dekker) and others.  Some references, e.g., Hollnagel, Perrow and the HRO school, should be read in their original form. 


*  D.D. Woods, S. Dekker, R. Cook, L. Johannesen and N. Sarter, Behind Human Error, 2d ed.  (Ashgate, Burlington, VT: 2010).  Thanks to Bill Mullins for bringing this book to our attention.

**  There is considerable overlap of the perspectives of the authors and the control theorists (Leveson and Rasmussen are cited in the book).  As an aside, Dekker was a dissertation advisor for one of Leveson's MIT students.

***  The authors' different backgrounds contribute to this mash-up.  Cook is a physician, Dekker is a pilot and some of Woods' cited publications refer to nuclear power (and aviation).

****  M. Makary, “How to Stop Hospitals From Killing Us,” Wall Street Journal online (Sept. 21, 2012).  Retrieved July 4, 2013.

Saturday, June 29, 2013

Timely Safety Culture Research

In this post we highlight the doctoral thesis paper of Antti Piirto, “Safe Operation of Nuclear Power Plants – Is Safety Culture an Adequate Management Method?”*  One reason for our interest is the author’s significant background in nuclear operations.**  Thus his paper has academic weight but is informed by direct management experience.

It would be impossible to credibly summarize all of the material and insights from this paper as it covers a wide swath of safety management and culture and associated research.  The pdf is 164 pages.  In this post we will provide an overview of the material with pointers to some aspects that seem most interesting to us.

Overview

The paper is developed from Piirto’s view that “Today there is universal acceptance of the significant impact that management and organisational factors have over the safety significance of complex industrial installations such as nuclear power plants. Many events with significant economic and public impact had causes that have been traced to management deficiencies.” (p. i)  It provides a comprehensive and useful overview of the development of safety management and safety culture thinking and methods, noting that all too often efforts to enhance safety are reactive.

“For many years it has been considered that managing a nuclear power plant was mostly a matter of high technical competence and basic managerial skills.” (p. 3)  And we would add, in many quarters there is a belief that safety management and culture simply flow from management “leadership”.  While leadership is an important ingredient in any management system, its inherent fuzziness leaves a significant gap in efforts to systematize methods and tools to enhance performance outcomes.  Again citing Piirto, safety culture is “especially vague to those carrying out practical safety work. Those involved...require explanations concerning how safety culture will alter their work” (p. 4)

Piirto also cites the prevalence in the nuclear industry of “unilateral thinking” and the lack of exposure to external criticism of current nuclear management approaches, accompanied by “homogeneous managerial rhetoric”. (p. 4)

“Safety management at nuclear power plants needs to become more transparent in order to enable us to ensure that issues are managed correctly.” (p. 6)  “Documented safety thinking provides the organisation with a common starting point for future development.” (p. 8)  Transparency and the documentation (preservation) of safety thinking resonates with us.  When forensic efforts have been made to dissect safety thinking (e.g., see Perin’s book Shouldering Risks) it is apparent how illuminating and educational such information can be.

Culture as Control Mechanism

Piirto describes organizational culture as…”a socially constructed, unseen, and unobservable force behind organisational activities.” (p. 13)  “It functions as an organisational control mechanism, informally approving or prohibiting behaviours.” (p. 14)

We would say that in terms of a control mechanism, culture’s effect should be clarified as being one of perhaps many mechanisms that ultimately combine to determine actual behavior.  In our conceptual model safety culture specifically can be thought of as a resistance to other non-safety pressures affecting people and their actions.  (See our post dated June 29, 2012.)  Piirto calls culture a “powerful lever” for guiding behavior. (p. 15)  The stronger the resident safety culture is the more leverage it has to keep in check other pressures.  However it is also almost inevitable that there can be some amount of non-safety pressure that compromises the control leverage of safety culture and perhaps leads to undesired outcomes.

Some of Piirto’s most useful insights can be found on p. 14 where he explains that culture at its essence is “a concept rather than a thing” - and a concept created in people’s minds.  We like the term “mental model” as well.  He goes on to caution that we must remember that culture is not just a set of structural elements or constructs - “It also is a dynamic process – a social construction that is undergoing continual reconstruction.”  Perhaps another way of saying this is to realize that culture cannot be understood apart from its application within an organization.  We think this is a significant weakness of culture surveys that tend to ask questions in the abstract, e.g., “Is safety a high priority?”, versus exploring precisely how safety priorities are exercised in specific decisions and actions of the organization.

Piirto reviews various anthropologic and sociologic theories of culture including debate about whether culture is a dependent or independent variable (p.18), the origins of safety culture, and culture surveys. (pp. 23-24)

Some other interesting content can be found starting at Section 2.2.7 (p. 29) where Piirto reviews approaches to the assessment of safety culture, which really amounts to - what is the practical reality associated with a culture.  He notes “the correlation between general preferences and specific behaviour is rather modest.” and “The Situational Approach suggests that the emphasis should be put on collecting data on actual practices, real dilemmas and decisions (what is also called “theories in use”) rather than on social norms.” (p. 29)

Knowledge Management and Training

Starting on p. 39 is a very useful discussion of Knowledge Management including its inherently dynamic nature.  Knowledge Management is seen as being at the heart of decision making and in assessing options for action.

In terms of theories of how people behave, there are two types, “...the espoused theory, or how people say they act, and the theory-in-use, or how people actually act. The espoused theory is easier to understand. It describes what people think and believe and how they say they act. It is often on a conscious level and can be easily changed by new ideas and information. However, it is difficult to be aware of the theory-in-use, and it is difficult to change...” (p. 46)

At this juncture we would like to have seen a closer connection between the discussions of Knowledge Management and safety management.  True, ensuring that individuals have the benefit of preserving, sustaining and increasing knowledge is important, but how exactly does that get reflected in safety management performance?  Piirto does draw an analogy between systematic approaches to training and proposes that a similar approach would benefit safety management, by documenting how safety is related to practical work.  “This would turn safety culture into a concrete tool. Documented safety thinking provides the organisation with a common starting point for future development.” (p. 61)

One way to document safety thinking is through event investigation.  Piirto observes, “Event investigation is generally an efficient starting point for revealing the complex nature of safety management. The context of events reveals the complex interaction between people and technology in an organisational and cultural context. Event investigations should not only focus on events with high consequences; in most complex events a through investigation will reveal basic causes of great interest, particularly at the safety management level. Scientific studies of event investigation techniques and general descriptions of experience feedback processes have had a tendency to regard event investigations as too separated from a broader safety management context.”  (p. 113)

In the last sections of the paper Piirto summarizes the results of several research projects involving training and assessment of training effectiveness, knowledge management and organizational learning.  Generally these involve the development and training of shift personnel.

Take Away

Ultimately I’m not sure that the paper provides a simple answer to the question posed in its title: Is safety culture an adequate management method?  Purists would probably observe that safety culture is not a management method; on the other hand I think it is hard to ignore the reliance being placed by regulatory bodies on safety culture to help assure safety performance.  And much of this reliance is grounded in an “espoused theory” of behavior rather than a systematic, structured and documented understanding of actual behaviors and associated safety thinking.  Such “theory in use” findings would appear to be critical in connecting expectations for values and beliefs to actual outcomes.  Perhaps the best lesson offered in the paper is that there needs to be a much better overall theory of safety management that links cultural, knowledge management and training elements.


*  A. Piirto,  “Safe Operation of Nuclear Power Plants – Is Safety Culture an Adequate Management Method?” thesis for the degree of Doctor of Science in Technology (Tampere, Finland: Tampere Univ. of Technology, 2012).

**  Piirto has a total of 36 years in different management and supervision tasks in a nuclear power plant organization, including twelve years as the Manager of Operation for the Olkiluoto nuclear power plant.

Wednesday, June 26, 2013

Dynamic Interactive Training

The words dynamic and interactive always catch our attention as they are intrinsic to our world view of nuclear safety culture learning.  Carlo Rusconi’s presentation* at the recent IAEA International Experts’ Meeting on Human and Organizational Factors in Nuclear Safety in the Light of the Accident at the Fukushima Daiichi Nuclear Power Plant in Vienna in May 2013 is the source of our interest.

While much of the training described in the presentation appeared to be oriented to the worker level and the identification of workplace type hazards and risks, it clearly has implications for supervisory and management levels as well.

In the first part of the training students are asked to identify and characterize safety risks associated with workplace images.  For each risk they assign an index based on perceived likelihood and severity.  We like the parallel to our proposed approach for scoring decisions according to safety significance and uncertainty.**

“...the second part of the course is focused on developing skills to look in depth at events that highlight the need to have a deeper and wider vision of safety, grasping the explicit and implicit connections among technological, social, human and organizational features. In a nutshell: a systemic vision.” (slide 13, emphasis added)  As part of the training students are exposed to the concepts of complexity, feedback and internal dynamics of a socio-technical system.  As the author notes, “The assessment of culture within an organization requires in-depth knowledge of its internal dynamics”.  (slide 15)

This part of the training is described as a “simulation” as it provides the opportunity for students to simulate the performance of an investigation into the causes of an actual event.  Students are organized into three groups of five persons to gain the benefit of collective analysis within each group followed by sharing of results across groups.  We see this as particularly valuable as it helps build common mental models and facilitates integration across individuals.  Last, the training session takes the student’s results and compares them to the outcomes from a panel of experts.  Again we see a distinct parallel to our concept of having senior management within the nuclear organization pre-analyze safety issues to establish reference values for safety significance, uncertainty and preferred decisions.  This provides the basis to compare trainee outcomes for the same issues and ultimately to foster alignment within the organization.

Thank you Dr. Rusconi.



*  C. Rusconi, “Interactive training: A methodology for improving Safety Culture,” IAEA International Experts’ Meeting on Human and Organizational Factors in Nuclear Safety in the Light of the Accident at the Fukushima Daiichi Nuclear Power Plant, Vienna May 21-24, 2013.

**  See our blog posts dated April 9 and June 6, 2013.  We also remind readers of Taleb’s dictate to decision makers to focus on consequences versus probability in our post dated June 18, 2013.

Tuesday, June 25, 2013

Regulatory Creep

The NRC's assessment of safety culture (SC) is an example of regulatory creep.  It began with the requirement that licensees determine whether specific safety-related performance problems or cross-cutting issues were caused, in whole or in part, by SC deficiencies.  Then the 2011 SC Policy Statement attempted to put a benign face on NRC intrusiveness because a policy statement is not a regulation.  However, licensees are “expected” to comply with the policy statement's goals and guidance; the NRC “expectations” become de facto regulations.

We have griped about this many times.*  But why does regulatory creep occur?  Is it inevitable?  We'll start with some background then look at some causes.

In the U.S., Congress passes and the President approves major legislative acts.  These are top-level policy statements characterized by lofty goals and guiding principles.  Establishing the detailed rules (which have the force of law) for implementing these policies falls to government bureaucrats in regulatory agencies.  There are upwards of 50 such agencies in the federal government, some part of executive branch departments (headed by a Cabinet level officer), others functioning independently, i.e., reporting to Congress with the President appointing, subject to Congressional approval, their governing boards (commissioners).  The NRC is one of the independent federal regulatory agencies.

Regulatory rules are proposed and approved following a specified, public process.  But once they are in place, multiple forces can lead to the promulgation of new rules or an expanded interpretation or application of existing rules (creep).  The forces for change can arise internal or external to the agency.  Internal forces include the perceived need to address new real or imagined issues, a fear of losing control as the regulated entities adapt and evolve, or a generalized drive to expand regulatory authority.  Even bureaucrats can have a need for more power or a larger budget.

External sources include interest groups (and their lobbyists), members of Congress who serve on oversight committees, highly motivated members of the public or the agency's own commissioners.  We classify commissioners as external because they are not really part of an agency; they are political appointees of the President, who has a policy agenda.  In addition, a commissioner may owe a debt or allegiance to a Congressional sponsor who promoted the commissioner's appointment.

Given all the internal and external forces, it appears that new rules and  regulatory creep are inevitable absent the complete capture of the agency by its nominally regulated entities.  Creep means a shifting boundary of what is required, what is allowed, what is tolerated and what will be punished—without a formal rule making.  The impact of creep on the regulated entities is clear: increased uncertainty and cost.  They may not care for increased regulatory intrusiveness but they know the penalty may be high if they fail to comply.  When regulated entities perceive creep, they must make a business decision: comply or fight.  They often choose to comply simply because if they fight and lose, they risk even more punitive formal regulation and higher costs.  If they fight and win, they risk alienating career bureaucrats who will then wait for an opportunity to exact retribution.  A classic lose-lose situation.  

Our perspective

Years ago I took a poli-sci seminar where the professor said public policy forces could be boiled down to: Who's mad?  How mad?  And who's glad?  How glad?  I sometimes refer to that simple mental model when I watch the ongoing Kabuki between the regulator, its regulated entities and many, many political actors.  Regulatory creep is one of the outcomes of such dynamics.


*  For related posts, click the "Regulation of Safety Culture" label.

Regulatory creep is not confined to the NRC.  The motivation for this post was an item forwarded by a reader on reported Consumer Product Safety Commission (CPSC) activity.  Commenting on a recent settlement, a CPSC Commissioner “expressed concern that . . . the CPSC had insisted on a comprehensive compliance program absent evidence of widespread noncompliance and that “the compliance program language in [the] settlement is another step toward just such a de facto rule.””  C.G. Thompson, “Mandated Compliance Programs as the New Normal?” American Conference Institute blog.  Retrieved June 6, 2013.

Tuesday, June 18, 2013

The Incredible Shrinking Nuclear Industry

News last week that the San Onofre units would permanently shutdown - joining Crystal River 3 (CR3) and Kewaunee as the latest early retirees - and filling in the last leg of a nuclear bad news trifecta.  This is distressing on many fronts, not the least of which is the loss of jobs for thousands of highly qualified nuclear personnel, and perhaps the suggestion of a larger trend.  Almost as distressing is the characterization by NEI that San Onofre is a unique situation - as were CR3 and Kewaunee by the way - and placing primary blame on the NRC.*  Really?  The more useful question to ponder is what decisions led up to the need for plant closures and whether there is a common denominator? 

We can think of one: decisions that failed to adequately account for the “tail” of the risk distribution where outcomes, albeit of low probability, carry high consequences.  On this score checking in with Nick Taleb is always instructive.  He observes “This idea that in order to make a decision you need to focus on the consequences (which you can know) rather than the probability (which you can’t know) is the central idea of uncertainty.”**
  • For Kewaunee the decision to purchase the plant with a power purchase agreement (PPA) that extended only for eight years;
  • For CR3, the decision to undertake cutting the containment with in-house expertise;
  • For SONGs the decision to purchase and install new design steam generators from a vendor working beyond its historical experience envelope.
Whether the decision makers understood this, or even imagined that their decisions included the potential to lose the plants, the results speak for themselves.  These people were in Black Swan and fat tail territory and didn’t realize it.  Let’s look at a few details.

Kewaunee

Many commentators at this point are writing off the Kewaunee retirement based on the miracle of low gas prices.  Dominion cites gas prices and the inability to acquire additional nuclear units in the upper Midwest to achieve economies of scale.  But there is a far greater misstep in the story.  When Dominion purchased Kewaunee from Wisconsin Public Service in 2005, a PPA was included as part of the transaction.  This is an expected and necessary part of the transaction as it established set prices for the sale of the plant’s output for a period of time.  A key consideration in structuring deals such as this is not only the specific pricing terms for the asset and the PPA, but the duration of the PPA.  In the case of Kewaunee the PPA ran for only 8 years, through December 2013.  After 8 years Dominion would have to negotiate another PPA with the local utilities or others or sell into the market.  The question is - when buying an asset with a useful life of 28 years (with grant of the 20 year license extension), why would Dominion be OK with just an 8 year PPA?  Perhaps Dominion assumed that market prices would be higher in 8 years and wanted to capitalize on those higher prices.  Opponents to the transaction believed this to be the case.***  The prevailing expectation at the time was that demand would continue along with appropriate pricing necessary to accommodate current and planned generating units.  But the economic downturn capped demand and left a surplus of baseload.  Local utilities faced with the option of negotiating a PPA for Kewaunee - or thinning the field and protecting their own assets - did what was in their interest. 

The reality is that Dominion rolled the dice on future power prices.  Interestingly, in the same time frame, 2007, the Point Beach units were purchased by NextEra Energy Resources (formerly FPL Energy).  In this transaction PPAs were negotiated through the end of the extended license terms of the units, 2030 and 2033, providing the basis for a continuing and productive future.

Crystal River 3

In 2009 Progress Energy undertook a project to replace the steam generators in CR3.  As with some other nuclear plants this necessitated cutting into the containment to allow removal of the old generators and placement of the new. 

Apparently just two companies, Bechtel and SGT, had managed all the previous 34 steam generator replacement projects at U.S. nuclear power plants. Of those, at least 13 had involved cutting into the containment building. All 34 projects were successful.

For the management portion of the job, Progress got bids from both Bechtel and SGT. The lowest was from SGT but Progress opted to self-manage the project to save an estimated $15 million.  During the containment cutting process delamination of concrete occurred in several places.  Subsequently an outside engineering firm hired to do the failure analysis stated that cutting the steel tensioning bands in the sequence done by Progress Energy along with removing of the concrete had caused the containment building to crack.  Progress Energy disagreed stating the cracks “could not have been predicted”.  (See Taleb’s view on uncertainty above.)

“Last year, the PSC endorsed a settlement agreement that let Progress Energy refund $288 million to customers in exchange for ending a public investigation of how the utility broke the nuclear plant.”****

When it came time to assess how to fix the damage, Progress Energy took a far more conservative and comprehensive approach.  They engaged multiple outside consultants and evaluated numerous possible repair options.  After Duke Energy acquired Progress, Duke engaged an independent, third-party review of the engineering and construction plan developed by Progress.  The independent review suggested that the cost was likely to be almost $1.5 billion. However, in the worst-case scenario, it could cost almost $3.5 billion and take eight years to complete.   “...the [independent consultant] report concluded that the current repair plan ‘appears to be technically feasible, but significant risks and technical issues still need to be resolved, including the ultimate scope of any repair work.’"*****  Ultimately consideration of the potentially huge cost and schedule consequences caused Duke to pull the plug.  Taleb would approve.

San Onofre

Southern California Edison undertook a project to replace its steam generators almost 10 years ago.  It decided to contract with Mitsubishi Heavy Industries (MHI) to design and construct the generators.  This would be new territory for Mitsubishi in terms of the size of the generators and design complexity.  Following installation and operation for a period of time, tube leakage occurred due to excessive vibrations.  The NRC determined that the problems in the steam generators were associated with errors in MHI's computer modeling, which led to underestimation of thermal hydraulic conditions in the generators.

“Success in developing a new and larger steam generator design requires a full understanding of the risks inherent in this process and putting in place measures to manage these risks….Based upon these observations, I am concerned that there is the potential that design flaws could be inadvertently introduced into the steam generator design that will lead to unacceptable consequences (e.g., tube wear and eventually tube plugging). This would be a disastrous outcome for both of us and a result each of our companies desire to avoid. In evaluating this concern, it would appear that one way to avoid this outcome is to ensure that relevant experience in designing larger sized steam generators be utilized. It is my understanding the Mitsubishi Heavy Industries is considering the use of Westinghouse in several areas related to scaling up of your current steam generator design (as noted above). I applaud your effort in this regard and endorse your attempt to draw upon the expertise of other individuals and company's to improve the likelihood of a successful outcome for this project.”#

Unfortunately these concerns raised by SCE came after letting the contract to Mitsubishi.  SCE placed (all of) its hopes on improving the likelihood of a successful outcome at the same time stating that a design flaw would be “disastrous”.  They were right about the disaster part.

Take Away

These are cautionary tales on a significant scale.  Delving into how such high risk (technical and financial) decisions were made and turned out so badly could provide useful lessons learned.  That doesn’t appear likely given the interests of the parties and being inconsistent with the industry predicate of operational excellence.

With regard to our subject of interest, safety culture, the dynamics of safety decisions are subject to similar issues and bear directly on safety outcomes.  Recall that in our recent posts on implementing safety culture policy, we proposed a scoring system for decisions that includes the safety significance and uncertainty associated with the issue under consideration.  The analog to Taleb’s “central idea of uncertainty” is intentional and necessary.  Taleb argues you can’t know the probability of consequences.  We don’t disagree but as a “known unknown” we think it is useful for decision makers to recognize how uncertain the significance (consequences) may be and calibrate their decision accordingly.


*  “Of course, it’s regrettable...Crystal River is closing, the reasons are easy to grasp, and they are unique to the plant. Even San Onofre, which has also been closed for technical reasons (steam generator problems there), is quite different in specifics and probable outcome. So – unfortunate, yes; a dire pox upon the industry, not so much.”  NEI Nuclear Notes (Feb. 7, 2013).  Retrieved June 17, 2013.  For the NEI/SCE perspective on regulatory foot-dragging and uncertainty, see W. Freebairn et al, "SoCal Ed to retire San Onofre nuclear units, blames NRC delays," Platts (June 7, 2013).  Retrieved June 17, 2013.  And "NEI's Peterson discusses politics surrounding NRC confirmation, San Onofre closure," Environment & Energy Publishing OnPoint (June 17, 2013).  Retrieved June 17, 2013.

**  N. Taleb, The Black Swan (New York: Random House, 2007), p. 211.  See also our post on Taleb dated Nov. 9, 2011.

***  The Customers First coalition that opposed the sale of the plant in 2004 argued: “Until 2013, a complex purchased-power agreement subject to federal jurisdiction will replace PSCW review. After 2013, the plant’s output will be sold at prices that are likely to substantially exceed cost.”  Customers First!, "Statement of Position: Proposed Sale of the Kewaunee Nuclear Power Plant April 2004" (April, 2004).  Retrieved June 17, 2013.

****  R. Trigaux, "Who's to blame for the early demise of Crystal River nuclear power plant?" Tampa Bay Times (Feb. 5, 2013).  Retrieved Jun 17, 2013.  We posted on CR3's blunder and unfolding financial mess on Nov. 11, 2011.

*****  "Costly estimates for Crystal River repairs," World Nuclear News (Oct. 2, 2012).  Retrieved June 17, 2013.

#  D.E. Nunn (SCE) to A. Sawa (Mitsubishi), "Replacement Steam Generators San Onofre Nuclear Generating Station, Units 2 & 3" (Nov. 30, 2004).  Copy retrieved June 17, 2013 from U.S. Senate Committee on Environment & Public Works, attachment to Sen. Boxer's May 28, 2013 press release.


Friday, June 14, 2013

Meanwhile, Back at the Vit Plant

Previous posts* have chronicled the safety culture (SC) issues raised at the Waste Treatment and Immobilization Plant (WTP aka the Vit plant) at the Department of Energy's (DOE's) Hanford site.  Both the DOE Office of River Protection (ORP) and the WTP contractor (Bechtel) have been under the gun to strengthen their SC.  On May 30, 2013 DOE submitted a progress report** to the Defense Nuclear Facilities Safety Board covering both DOE and Bechtel activities.

DOE ORP

Based on an assessment by an internal SC Integrated Project Team (IPT), ORP reported its progress on nine near-term SC improvement actions contained in the ORP SC Improvement Plan.  For each action, the IPT assessed degree of implementation (full, partial or none) and effectiveness (full, partial, or indeterminate).  The following table summarizes the actions and current status.




ORP has a lot of activities going on but only two are fully implemented and none is yet claimed to be fully effective.  In ORP's own words, “ORP made a substantial start toward improving its safety culture, but much remains to be done to demonstrate effective change. . . . Four of the nine actions were judged to be partially effective, and the other five were judged to be of indeterminate effectiveness at the time of evaluation due to the recent completion of some of the actions, and because of the difficulty in measuring safety culture change over a one-year time period.” (Smith, p. 1)

The top-level ORP actions look substantive but digging into the implementation details reveals many familiar tactics for addressing SC problems: lots of training (some yet to be implemented), new or updated processes and procedures, (incomplete) distribution of INPO booklets, and the creation of a new behavioral expectations poster (which is largely ignored).

SC elements have been added to senior management and supervisor performance plans.  That appears to mean these folks are supposed to periodically discuss SC with their people.  There's no indication whether such behavior will be included in performance review or compensation considerations.

ORP did attempt to address concerns with the Differing Professional Opinion (DPO) process.  DPO and Employee Concerns Program (ECP) training was conducted but some employees reported reservations about both programs.

A new issues management system has been well received by employees but needs greater promotion by senior managers to increase employees' willingness to raise issues and ask questions.  The revised ECP also needs increased senior management support.

The team pointed out that ORP does not have a SC management statement or policy.

Bechtel

There is much less detail available here.  The report says Bechtel's plan “contains 50 actions broken into six strategic improvement areas:

A. Realignment and Maintenance of Design and Safety Basis
B. Management Processes of the WTP NSQC
C. Timeliness of Issues Identification
D. Resolution. Roles. Responsibilities. Authorities, and Accountabilities
E. Management and Supervisory Behaviors
F.  Construction Site-Unique Issues

“The scheduled completion date for the last actions is December 2013. Twenty-seven actions were complete as of March 31, 2013, with an additional 12 planned to be complete by June 30, 2013.” (p. 19)

“ORP has completed surveillances on 19 of the 27 completed actions identifying 7 opportunities for improvement.  Because changing an organization's culture takes time, the current oversight efforts are focused on verifying actions have been completed.” (ibid.)  In other words, there has been no evaluation of the effectiveness of Bechtel's actions.

Our perspective

The ORP program is a traditional approach aimed at incremental organizational performance improvement.  There is no or scant mention of what we'd call strategic concerns, e.g., recognizing and addressing schedule/budget/safety goal conflicts; decision making in a complex, dynamic environment with many external pressures; riding herd on Bechtel; or creating a sense of urgency with respect to SC.

The most surprising thing to us was how unexpectedly candid the assessment was (for one produced by an employee team) in describing the program's impact to date.  For example, as the IPT performed its assessment, it tried to determine if employees were aware of the SC actions or their effects.  The results were mixed: some employees see changes but many don't, or they sense a general change but are unaware of specifics, e.g., new or changed procedures.  In general, organizational emphasis on SC declined over the year and was not very visible to the average employee.

The team's most poignant item was a direct appeal for personal involvement
by the ORP manager in the SC program.  That tells you everything you need to know about SC's priority at ORP.


*  Click on Vit Plant under Labels to see previous posts.

**  M. Moury (DOE) to P.S. Winokur (DNFSB), DOE completes Action 1-9 of the Department's Implementation Plan for DNFSB Recommendation 2011-1, Safety Culture at the Waste Treatment and Immobilization Plant (May 30, 2013).  A status summary memo from ORP's K.W. Smith and the IPT report are attached to the Moury letter.  Our thanks to Bill Mullins for bringing these documents to our attention.

Wednesday, June 12, 2013

McKinsey Quarterly Report on Decision Making Styles

A brief article* in the April McKinsey Quarterly describes a piece of early-stage academic research into different individuals' decision making styles at work.

This is not rigorous social science.  The 5,000 survey participants were self-selected readers of the McKinsey Quarterly and Harvard Business Review.  Survey responses showed a range of decision making preferences, from largely intuitive to exhaustive deliberation.  Further analysis identified five different types of decision-makers.  Each type has exposure to certain decision making risks based on the decision-maker's preference for say, moving ahead quickly vs. lengthy analysis.  In other words, each type exhibits certain biases.

A practical application of this typology is to see which type best describes two very important people: you and your boss.  Self assessment is always valuable to identify current strengths and improvement opportunities.  Boss assessment may reveal why your boss sees things differently from you, and suggest ways you can support and complement your boss to help you both become more successful at work.


*  D. Lovallo and O. Sibony, “Early-stage research on decision-making styles,” McKinsey Quarterly (April 2013).  Retrieved June 11, 2013.  A pop-out button is on the right side of the text, about half-way down the article; pushing the button opens a slide show of the different decision making types.  A pdf of the article can be downloaded if one registers (free) at the site.