Tuesday, July 30, 2013

Introducing NuclearSafetySim

We have referred to NuclearSafetySim and the use of simulation tools on a regular basis in this blog.  NuclearSafetySim is our initiative to develop a new approach to safety management training for nuclear professionals.  It utilizes a simulator to provide a realistic nuclear operations environment within which players are challenged by emergent issues - where they must make decisions balancing safety implications and other priorities - over a five year period.  Each player earns an overall score and is provided with analyses and data on his/her decision making and performance against goals.  It is clearly a different approach to safety culture training, one that attempts to operationalize the values and traits espoused by various industry bodies.  In that regard it is exactly what nuclear professionals must do on a day to day basis. 

At this time we are making NuclearSafetySim available to our readers through a web-based demo version.  To get started you need to access the NuclearSafetySim website.  Click on the Introduction tab at the top of the Home page.  Here you will find a link to a narrated slide show that provides important background on the approach used in the simulation.  It runs about 15 minutes.  Then click on the Simulation tab.  Here you will find another video which is a demo of NuclearSafetySim.  While this runs about 45 minutes (apologies) it does provide a comprehensive tutorial on the sim and how to interact with it.  We urge you to view it.  Finally...at the bottom of the Simulation page is a link to the NuclearSafetySim tool.  Clicking on the link brings you directly to the Home screen and you’re ready to play.

As you will see on the website and in the sim itself, there are reminders and links to facilitate providing feedback on NuclearSafetySim and/or requesting additional information.  This is important to us and we hope our readers will take the time to provide thoughtful input, including constructive criticism.  We welcome all comments. 

Wednesday, July 24, 2013

Leadership, Culture and Organizational Performance

As discussed in our July 18, 2013 post, INPO's position is that creating and maintaining a healthy safety culture (SC) is a primary leadership responsibility.*  That seems like a common sense belief but is it based on any social science?  What is the connection between leader behavior and culture?  And what is the connection between culture and organizational performance? 

To help us address these questions, we turn to a paper** by some Stanford and UC Berkeley academics.  They review the relevant literature and present their own research and findings.  This paper is not a great fit with nuclear power operations but some of the authors' observations and findings are useful.  One might think there would be ample materials on this important topic but “only a very few studies have actually explored the interrelationships among leadership, culture and performance.” (p. 33)

Leaders and Culture


Leaders can be described by different personality types.  Note this does not focus on specific behavior, e.g., how they make decisions, but the attributes of each personality type certainly imply the kinds of behavior that can reasonably be expected.  The authors contend “. . . the myriad of potential personality and value constructs can be reliably captured by five essential personality constructs, the so-called Big Five or the Five Factor Model . . .” (p. 6)  You have all been exposed to the Big 5, or a similar, taxonomy.  An individual may exhibit attributes from more than one type but can be ultimately be classified as primarily representative of one specific type.  The five types are listed below, with a few selected attributes for each.
  • Agreeableness (Cooperative, Compromising, Compassionate, Trusting)
  • Conscientiousness (Orderly, Reliable, Achievement oriented, Self-disciplined, Deliberate, Cautious)
  • Extraversion (Gregarious, Assertive, Energy, Optimistic)
  • Neuroticism (Negative affect, Anxious, Impulsive, Hostile, Insecure)
  • Openness to Experience (Insightful, Challenge convention, Autonomous, Resourceful)

Leaders can affect culture and later we'll see that some personality types are associated with specific types of organizational culture.  “While not definitive, the evidence suggests that personality as manifested in values and behavior is associated with leadership at the CEO level and that these leader attributes may affect the culture of the organization, although the specific form of these relationships is not clear.” (p. 10)  “. . . senior leaders, because of their salience, responsibility, authority and presumed status, have a disproportionate impact on culture, . . .” (p. 11)

Culture and Organizational Performance

Let's begin with a conclusion: “One of the most important yet least understood questions is how organizational culture relates to organizational performance” (p. 11)

To support their research model, the authors describe a framework, similar to the Big 5 for personality, for summarizing organizational cultures.  The Organizational Culture Profile (OCP) features seven types of culture, listed below with a few selected attributes for each. 

  • Adaptability (Willing to experiment, Taking initiative, Risk taking, Innovative)
  • Collaborative (Team-oriented, Cooperative, Supportive, Low levels of conflict)
  • Customer-oriented (Listening to customers, Being market driven)
  • Detail-oriented (Being precise, Emphasizing quality, Being analytical)
  • Integrity (High ethical standards, Being honest)
  • Results-Oriented (High expectations for performance, Achievement oriented, Not easy going)
  • Transparency (Putting the organization’s goals before the unit, Sharing information freely)
The linkage between culture and performance is fuzzy.  “While the strong intuition was that organizational culture should be directly linked to firm effectiveness, the empirical results are equivocal.” (p. 14)  “[T]he association of culture and performance is not straightforward and likely to be contingent on the firm’s strategy, the degree to which the culture promotes adaptability, and how widely shared and strongly felt the culture is.” (p. 17)  “Further compounding the issue is that the relationship between culture and firm performance has been shown to vary across industries.” (p. 11)  Finally, “although the [OCP] has the advantage of identifying a comprehensive set of cultural dimensions, there is no guarantee that any particular dimension will be relevant for a particular firm.” (p. 18)  I think it's fair to summarize the culture-performance literature by saying “It all depends.” 

Research Results

The authors gathered and analyzed data on a group of high-technology firms: CEO personalities based on the Big 5 types, cultural descriptions using the OCP, and performance data.  Firm performance was based on financial metrics, firm reputation (an intangible asset) and employee attitudes.*** (p. 23-24) 

“[T]he results reveal a number of significant relationships between CEO personality and firm culture, . . . CEOs who were more extraverted (gregarious, assertive, active) had cultures that were more results-oriented. . . . CEOs who were more conscientious (orderly, disciplined, achievement-oriented) had cultures that were more detail-oriented . . . CEOs who were higher on openness to experience (ready to challenge convention, imaginative, willing to try new activities) [were] more likely to have cultures that emphasized adaptability. (p. 26)

“Cultures that were rated as more adaptable, results-oriented and detail-oriented were seen more positively by their employees. Firms that emphasized adaptability and were more detail-oriented were also more admired by industry observers.” (p. 28)

In sum, the linkage between leadership and performance is far from clear.  But “consistent patterns of [CEO] behavior shape interpretations of what’s important [values] and how to behave. . . . Other research has shown that a CEO’s personality may affect choices of strategy and structure.” (p. 31)

Relevance to Nuclear Operations


As mentioned in the introduction, this paper is not a great fit with the nuclear industry.  The authors' research focuses on high technologically companies, there is nothing SC-specific and their financial performance metrics (more important to firms in highly competitive industries) are more robust than their non-financial measures.  Safety performance is not mentioned.

But their framework stimulates us to ask important questions.  For example, based on the research results, what type of CNO would you select for a plant with safety performance problems?  How about one facing significant economic challenges?  Or one where things are running smoothly?  Based on the OCP, what types of culture would be most supportive of a strong SC?  Would any types be inconsistent with a strong SC?  How would you categorize your organization's culture?  

The authors suggest that “Senior leaders may want to consider developing the behaviors that cultivate the most useful culture for their firm, even if these behaviors do not come naturally to them.” (p. 35)  Is that desirable or practical for your CNO?

The biggest challenge to obtaining generalizable results, which the authors recognize, is that so many driving factors are situation-specific, i.e., dependent on a firm's industry, competitive position and relative performance.  They also recognize a possible weakness in linear causality, i.e., the leadership → culture → performance logic may not be one-way.  In our systems view, we'd say there are likely feedback loops, two-way influence flows and additional relevant variables in the overall model of the organization.

The linear (Newtonian) viewpoint promoted by INPO suggests that culture is mostly (solely?) created by senior executives.  If only it were that easy.  Such a view “runs counter to the idea that culture is a social construct created by many individuals and their behavioral patterns.” (p. 10)  We believe culture, including SC, is an emergent organizational property created by the integration of top-down activities with organizational history, long-serving employees, and strongly held beliefs and values, including the organization's “real” priorities.  In other words, SC is a result of the functioning over time of the socio-technical system.  In our view, a CNO can heavily influence, but not unilaterally define, organizational culture including SC.



*  As another example of INPO's position, a recent presentation by an INPO staffer ends with an Ed Schein quote: “...the only thing of real importance that leaders do is to create and manage culture...”  The quote is from Schein's Organizational Culture and Leadership (San Francisco, CA: Jossey-Bass, 1985), p. 2.  The presentation was A. Daniels, “How to Continuously Improve Cultural Traits for the Management of Safety,” IAEA International Experts’ Meeting on Human and Organizational Factors in Nuclear Safety in the Light of the Accident at the Fukushima Daiichi Nuclear Power Plant, Vienna May 21-24, 2013.
 

**  C. O’Reilly, D. Caldwell, J. Chatman and B. Doerr, “The Promise and Problems of Organizational Culture: CEO Personality, Culture, and Firm Performance”  Working paper (2012).  Retrieved July 22, 2013.  To enhance readability, in-line citations have been removed from quotes.

***  The authors report “Several studies show that culture is associated with employee attitudes . . . ” (p. 14)

Thursday, July 18, 2013

INPO: Traits of a Healthy Nuclear Safety Culture

The Institute of Nuclear Power Operations (INPO) has released a document* that aims at aligning their previous descriptions of safety culture (SC) with current NRC SC terminology.  The document describes the essential traits and attributes of a healthy** nuclear SC.  “[A] trait is defined as a pattern of thinking, feeling, and behaving such that safety is emphasized over competing priorities. . . . The attributes clarify the intent of the traits.” (p. 3)  While there is an effort to align with NRC, the document remains consistent with INPO policy, viz., SC is a primary leadership responsibility.  Leaders are expected to regularly reinforce SC, measure SC in their organization and communicate what constitutes a healthy SC.

There are ten traits organized into three categories.  Each trait has multiple attributes and each attribute has representative observable behaviors that are supposed to evidence the attribute's existence, scope and strength.  Many of the behaviors stress management's responsibilities.  The report has too much detail to summarize in this post so we'll concentrate on one of the key SC artifacts we have repeatedly emphasized on this blog: decision making.

Decision making (DM) is one of the ten traits.  DM has three attributes: a consistent process, conservative bias and single-point accountability.  Risk insights are incorporated as appropriate.  Observable behaviors include: the organization establishes a well-defined DM process; individuals demonstrate an understanding of the DM process; leaders seek inputs from different work groups or organizations; when previous decisions are called into question by new facts, leaders reevaluate these decisions; conservative assumptions are used when determining whether emergent or unscheduled work can be conducted safely; leaders take a conservative approach to DM, particularly when information is incomplete or conditions are unusual; managers take timely action to address degraded conditions; executives and senior managers reinforce the expectation that the reactor will be shut down when procedurally required, when the margin for safe operation has degraded unacceptably, or when the condition of the reactor is uncertain; individuals do not rationalize assumptions for the sake of completing a task; and the organization ensures that important nuclear safety decisions are made by the correct person at the lowest appropriate level. (pp. 19-20)  That's quite a mouthful but it's not all of the behaviors and some of the included ones have been shortened to fit.

In addition to the above, communicating, explaining, challenging and justifying individual decisions are mentioned throughout the document.  Finally, “Leaders demonstrate a commitment to safety in their decisions and behaviors.” (p. 15)

Our perspective

On the positive side, the INPO treatment of DM is much more comprehensive than what we've seen in the NRC Common Language Path Forward materials released to date.

But the DM example illustrates a major problem with this type of document: a lengthy laundry list of observable behaviors that can morph into de facto requirements.  Now INPO says “. . . this document is not intended to be used as a checklist. It is encouraged that this document be considered for inclusion and use in self-assessments, root cause analyses, and training content, as appropriate.” (p. 3)  But while the observable behaviors may be intended as representative or illustrative, in practice they are likely to become first expectations then requirements.  An overall tone of absolutism reinforces this possibility.

The same tone is evident in the discussion of DM's larger context.  For example, INPO asserts that SC is a board and corporate responsibility but explicit or implicit priorities from above can create constraints on plant management's DM flexibility.  INPO also says “Executives and senior managers ensure sufficient corporate resources are allocated to the nuclear organization for short- and long-term safe and reliable operation” (p. 15) but the top and bottom of the organization may not agree on what level of resources is “sufficient.”

Another problem is the lack of priorities or relative importance.  Are all the traits equally important?  How about the attributes?  And the observable behaviors?  Is it up to, say, a team of QA assessors to determine what they need to include or do they only look at what the boss says or do they try to evaluate everything even remotely related to the scope of their inquiry?

But our biggest difficulty is with this statement: “These traits and attributes, when embraced, will be reflected in the values, assumptions, behaviors, beliefs, and norms of an organization and its members.” (p. 3)  This is naïve absolutism at its worst.  While some members of an organization may incorporate new values, others may comply with the rules and exhibit the desired behavior based on other factors, e.g., fear, peer pressure, desire for recognition or power, or money.  And ultimately, who cares why they do it?  As Commissioner Apostolakis said during an NRC meeting when the proposed SC policy was being discussed: “[W]e really care about what people do and maybe not why they do it. . . .”  (See our Feb. 12, 2011 post.)

We could not say it better ourselves.


*  Institute of Nuclear Power Operations (INPO), “Traits of a Healthy Nuclear Safety Culture”  INPO 12-012, Rev. 1 (April 2013).  The report has two addenda.  One describes nuclear safety behaviors and actions that contribute to a healthy nuclear SC by organizational level and the other provides cross-references to other INPO documents, the NRC ROP cross-cutting area components and the IAEA SC characteristics.  Thanks to Madalina Tronea for making these documents available.

** INPO refers to SC “health” while the NRC refers to SC “strength.”

Saturday, July 6, 2013

Behind Human Error by Woods, Dekker, Cook, Johannesen and Sarter

This book* examines how errors occur in complex socio-technical systems.  The authors' thesis is that behind every ascribed “human error” there is a “second story” of the context (conditions, demands, constraints, etc.) created by the system itself.  “That which we label “human error” after the fact is never the cause of an accident.  Rather, it is the cumulative effect of multiple cognitive, collaborative, and organizational factors.” (p. 35)  In other words, “Error is a symptom indicating the need to investigate the larger operational systems and the organizational context in which it functions.” (p. 28)  This post presents a summary of the book followed by our perspective on its value.  (The book has a lot of content so this will not be a short post.)

The Second Story

This section establishes the authors' view of error and how socio-technical systems function.  They describe two mutually exclusive world views: (1) “erratic people degrade an otherwise safe system” vs. (2) “people create safety at all levels of the socio-technical system by learning and adapting . . .” (p. 6)  It should be obvious that the authors favor option 2.

In such a world “Failure, then, represents breakdowns in adaptations directed at coping with complexity.  Indeed, the enemy of safety is not the human: it is complexity.” (p. 1)  “. . . accidents emerge from the coupling and interdependence of modern systems.” (p. 31) 

Adaptation occurs in response to pressures or environmental changes.  For example, systems are under stakeholder pressure to become faster, better, cheaper; multiple goals and goal conflict are regular complex system characteristics.  But adaptation is not always successful.  There may be too little (rules and procedures are followed even though conditions have changed) or too much (adaptation is attempted with insufficient information to achieve goals).  Because of pressure, adaptations evolve toward performance boundaries, in particular, safety boundaries.  There is a drift toward failure. (see Dekker, reviewed here)

The authors present 15 premises for analyzing errors in complex socio-technical systems. (pp. 19-30)  Most are familiar but some are worth highlighting and remembering when thinking about system errors:

  • “There is a loose coupling between process and outcome.”  A “bad” process does not always produce bad outcomes and a “good” process does not always produce good outcomes.
  • “Knowledge of outcome (hindsight) biases judgments about process.”  More about that later.
  • “Lawful factors govern the types of erroneous actions or assessments to be expected.”   In other words, “errors are regular and predictable consequences of a variety of factors.”
  • “The design of artifacts affects the potential for erroneous actions and paths towards disaster.”  This is Human Factors 101 but problems still arise.  “Increased coupling increases the cognitive demands on practitioners.”  Increased coupling plus weak feedback can create a latent failure.

Complex Systems Failure


This section covers traditional mental models used for assessing failures and points out the putative inadequacies of each.  The sequence-of-events (or domino) model is familiar Newtonian causal analysis.  Man-made disaster theory puts company culture and institutional design at the heart of the safety question.  Vulnerability develops over time but is hidden by the organization’s belief that it has risk under control.  A system or component is driven into failure.  The latent failure (or Swiss cheese) model proposes that “disasters are characterized by a concatenation of several small failures and contributing events. . .” (p. 50)  While a practitioner may be closest to an accident, the associated latent failures were created by system managers, designers, maintainers or regulators.  All these models reinforce the search for human error (someone untrained, inattentive or a “bad apple) and the customary fixes (more training, procedure adherence and personal attention, or targeted discipline).  They represent a failure to adopt systems thinking and concepts of dynamics, learning, adaptation and the notion that a system can produce accidents as a natural consequence of its normal functioning.

A more sophisticated set of models is then discussed.  Perrow's normal accident theory says that “accidents are the structural and virtually inevitable product of systems that are both interactively complex and tightly coupled.” (p. 61)  Such systems structurally confuse operators and prevent them from recovering when incipient failure is discovered.  People are part of the Perrowian system and can exhibit inadequate expertise.  Control theory sees systems as composed of components that must be kept in dynamic equilibrium based on feedback and continual control inputs—basically a system dynamics view.  Accidents are a result of normal system behavior and occur when components interact to violate safety constraints and the feedback (and control inputs) do not reflect the developing problems.  Small changes in the system can lead to huge consequences elsewhere.  Accident avoidance is based on making system performance boundaries explicit and known although the goal of efficiency will tend to push operations toward the boundaries.  In contrast, the authors would argue for a different focus: making the system more resilient, i.e., error-tolerant.**  High reliability theory describes how how-hazard activities can achieve safe performance through leadership, closed systems, functional decentralization, safety culture, redundancy and systematic learning.  High reliability means minimal variations in performance, which in the short-term, means safe performance but HROs are subject to incidents indicative of residual system noise and unseen changes from social forces, information management or new technologies. (See Weick, reviewed here)

Standing on the shoulders of the above sophisticated models, resilience engineering (RE) is proposed as a better way to think about safety.  According to this model, accidents “represent the breakdowns in the adaptations necessary to cope with the real world complexity. (p. 83)  The authors use the Columbia space shuttle disaster to illustrate patterns of failure evident in complex systems: drift toward failure, past success as reason for continued confidence, fragmented problem-solving, ignoring new evidence and intra-organizational communication breakdowns.  To oppose or compensate for these patterns, RE proposes monitoring or enhancing other system properties including: buffering capacity, flexibility, margin and tolerance (which means replacing quick collapse with graceful degradation).  RE “focuses on what sustains or erodes the adaptive capacities of human-technical systems in a changing environment.” (p. 93)  In practice, that means detecting signs of increasing risk, having resources for safety available, and recognizing when and where to invest to offset risk.  It also requires focusing on organizational decision making, e.g., cross checks for risky decisions, the safety-production-efficiency balance and the reporting and disposition of safety concerns.  “Enhancing error tolerance, detection and recovery together produce safety.” (p. 26)

Operating at the Sharp End

An organization's sharp end is where practitioners apply their expertise in an effort to achieve the organization's goals.  The blunt end is where support functions, from administration to engineering, work.  The blunt end designs the system, the sharp end operates it.  Practitioner performance is affected by cognitive activities in three areas: activation of knowledge, the flow of attention and interactions among multiple goals.

The knowledge available to practitioners arrives as organized content.  Challenges include: organization may be poor, the content may be incomplete or simply wrong.  Practitioner mental models may be inaccurate or incomplete without the practitioners realizing it, i.e., they may be poorly calibrated.  Knowledge may be inert, i.e., not accessed when it is needed.  Oversimplifications (heuristics) may work in some situations but produce errors in others and limit the practitioner's ability to account for uncertainties or conflicts that arise in individual cases.  The discussion of heuristics suggests Hollnagel, reviewed here.

Mindset is about attention and its control.” (p. 114)  Attention is a limited resource.  Problems with maintaining effective attention include loss of situational awareness, in which the practitioner's mental model of events doesn't match the real world, and fixation, where the practitioner's initial assessment of  a situation creates a going-forward bias against accepting discrepant data and a failure to trigger relevant inert knowledge.  Mindset seems similar to HRO mindfulness. (see Weick)

Goal conflict can arise from many sources including management policies, regulatory requirements, economic (cost) factors and risk of legal liability.  Decision making must consider goals (which may be implicit), values, costs and risks—which may be uncertain.  Normalization of deviance is a constant threat.  Decision makers may be held responsible for achieving a goal but lack the authority to do so.  The conflict between cost and safety may be subtle or unrecognized.  “Safety is not a concrete entity and the argument that one should always choose the safest path misrepresents the dilemmas that confront the practitioner.” (p. 139)  “[I]t is difficult for many organizations (particularly in regulated industries) to admit that goal conflicts and tradeoff decisions arise.” (p. 139)  Overall, the authors present a good discussion of goal conflict.

How Design Can Induce Error


The design of computerized devices intended to help practitioners can instead lead to greater risks of errors and incidents.  Specific causes of problems include clumsy automation, limited information visibility and mode errors. 

Automation is supposed to increase user effectiveness and efficiency.  However, clumsy automation creates situations where the user loses track of what the computer is set up to do, what it's doing and what it will do next.  If support systems are so flexible that users can't know all their possible configurations, they adopt simplifying strategies which may be inappropriate in some cases.  Clumsy automation leads to more (instead of less) cognitive work, user attention is diverted to the machine instead of the task, increased potential for new kinds of errors and the need for new user knowledge and judgments.  The machine effectively has its own model of the world, based on user inputs, data sensors and internal functioning, and passes that back to the user.

Machines often hide a mass of data behind a narrow keyhole of visibility into the system.  Successful design creates “a visible conceptual space meaningfully related to activities and constraints in a field of practice.” (p. 162)  In addition, “Effective representations highlight  'operationally interesting' changes for sequences of behavior . . .” (p. 167)  However, default displays typically do not make interesting events directly visible.

Mode errors occurs when an operator initiates an action that would be appropriate if the machine were in mode A but, in fact, it's in mode B.  (This may be a man-machine problem but it's not the machine's fault.)  A machine can change modes based on situational and system factors in addition to operator input.  Operators have to maintain mode awareness, not an easy task when viewing a small, cluttered display that may not highlight current mode or mode changes.

To cope with bad design “practitioners adapt information technology provided for them to the immediate tasks at hand in a locally pragmatic way, . . .” (p. 191)  They use system tailoring where they adapt the device, often by focusing on a feature set they consider useful and ignoring other machine capabilities.  They use task tailoring where they adapt strategies to accommodate constraints imposed by the new technology.  Both types of adaptation can lead to success or eventual failures. 

The authors suggest various countermeasures and design changes to address these problems. 

Reactions to Failure

Different approaches for analyzing accidents lead to different perspectives on human error. 

Hindsight bias is “the tendency for people to 'consistently exaggerate what could have been anticipated in foresight.'” (p. 15)  It reinforces the tendency to look for the human in the human error.  Operators are blamed for bad outcomes because they are available, tracking back to multiple contributing causes is difficult, most system performance is good and investigators tend to judge process quality by its outcome.  Outsiders tend to think operators knew more about their situation than they actually did.  Evaluating process instead of outcome is also problematic.  Process and outcome are loosely coupled and what standards should be used for process evaluation?  Formal work descriptions “underestimate the dilemmas, interactions between constraints, goal conflicts, and tradeoffs present in the actual workplace.” (p. 208)  A suggested alternative approach is to ask what other practitioners would have done in the same situation and build a set of contrast cases.  “What we should not do, . . . is rely on putatively objective external evaluations . . . such as . . . court cases or other formal hearings.  Such processes in fact institutionalize and legitimate the hindsight bias . . . leading to blame and a focus on individual actors at the expense of a system view.” (pp. 213-214)

Distancing through differencing is another risk.  In this practice, reviewers focus on differences between the context surrounding an accident and their own circumstance.  Blaming individuals reinforces belief that there are no lessons to be learned for other organizations.  If human error is local and individual (as opposed to systemic) then sanctions, exhortations to follow the procedures and remedial training are sufficient fixes.  There is a decent discussion of TMI here, where, in the authors' opinion, the initial sense of fundamental surprise and need for socio-technical fixes was soon replaced by a search for local, technologically-focused solutions.
      
There is often pressure to hold people accountable after incidents or accidents.  One answer is a “just culture” which views incidents as system learning opportunities but also draws a line between acceptable and unacceptable behavior.  Since the “line” is an attribution the key question for any organization is who gets to draw it.  Another challenge is defining the discretionary space where individuals alone have the authority to decide how to proceed.  There is more on just culture but this is all (or mostly) Dekker. (see our Just Culture commentary here)

The authors' recommendations for analyzing errors and improving safety can be summed up as follows: recognize that human error is an attribution; pursue second stories that reveal the multiple, systemic contributors to failure; avoid hindsight bias; understand how work really gets done; search for systemic vulnerabilities; study how practice creates safety; search for underlying patterns; examine how change will produce new vulnerabilities; use technology to enhance human expertise; and tame complexity. (p. 239)  “Safety is created at the sharp end as practitioners interact with hazardous processes . . . using the available tools and resources.” (p. 243)

Our Perspective

This is a book about organizational characteristics and socio-technical systems.  Recommendations and advice are aimed at organizational policy makers and incident investigators.  The discussion of a “just culture” is the only time culture is discussed in detail although safety culture is mentioned in passing in the HRO write-up.

Our first problem with the book is repeatedly referring to medicine, aviation, aircraft carrier operations and nuclear power plants as complex systems.***  Although medicine is definitely complex and aviation (including air traffic control) possibly is, carrier operations and nuclear power plants are simply complicated.  While carrier and nuclear personnel have to make some adaptations on the fly, they do not face sudden, disruptive changes in their technologies or operating environments and they are not exposed to cutthroat competition.  Their operations are tightly coordinated but, where possible, by design more loosely coupled to facilitate recovery if operations start to go sour.  In addition, calling nuclear power operations complex perpetuates the myth that nuclear is “unique and special” and thus merits some special place in the pantheon of industry.  It isn't and it doesn't.

Our second problem relates to the authors' recasting of the nature of human error.  We decry the rush to judgment after negative events, particularly a search limited to identifying culpable humans.  The search for bad apples or outright criminals satisfies society's perceived need to bring someone to justice and the corporate system's desire to appear to fix things through management exhortations and training without really admitting systemic problems or changing anything substantive, e.g., the management incentive plan.  The authors' plea for more systemic analysis is thus welcome.

But they push the pendulum too far in the opposite direction.  They appear to advocate replacing all human errors (except for gross negligence, willful violations or sabotage) with systemic explanations, aka rationalizations.  What is never mentioned is that medical errors lead to tens of thousands of preventable deaths per year.****  In contrast, U.S. commercial aviation has not experienced over a hundred fatalities (excluding 9/11) since 1996; carriers and nuclear power plants experience accidents, but there are few fatalities.  At worst, this book is a denial that real human errors (including bad decisions, slip ups, impairments, coverups) occur and a rationalization of medical mistakes caused by arrogance, incompetence, class structure and lack of accountability.

This is a dense book, 250 pages of small print, with an index that is nearly useless.  Pressures (most likely cost and schedule) have apparently pushed publishing to the system boundary for copy editing—there are extra, missing and wrong words throughout the text.

This 2010 second edition updates the original 1994 monograph.  Many of the original ideas have been fleshed out elsewhere by the authors (primarily Dekker) and others.  Some references, e.g., Hollnagel, Perrow and the HRO school, should be read in their original form. 


*  D.D. Woods, S. Dekker, R. Cook, L. Johannesen and N. Sarter, Behind Human Error, 2d ed.  (Ashgate, Burlington, VT: 2010).  Thanks to Bill Mullins for bringing this book to our attention.

**  There is considerable overlap of the perspectives of the authors and the control theorists (Leveson and Rasmussen are cited in the book).  As an aside, Dekker was a dissertation advisor for one of Leveson's MIT students.

***  The authors' different backgrounds contribute to this mash-up.  Cook is a physician, Dekker is a pilot and some of Woods' cited publications refer to nuclear power (and aviation).

****  M. Makary, “How to Stop Hospitals From Killing Us,” Wall Street Journal online (Sept. 21, 2012).  Retrieved July 4, 2013.