Thursday, August 29, 2013

Normal Accidents by Charles Perrow

This book*, originally published in 1984, is a regular reference for authors writing about complex socio-technical systems.**  Perrow's model for classifying such systems is intuitively appealing; it appears to reflect the reality of complexity without forcing the reader to digest a deliberately abstruse academic construct.  We will briefly describe the model then spend most of our space discussing our problems with Perrow's inferences and assertions, focusing on nuclear power.  

The Model

The model is a 2x2 matrix with axes of coupling and interactions.  Not surprisingly, it is called the Interaction/Coupling (IC) chart.

“Coupling” refers to the amount of slack, buffer or give between two items in a system.  Loosely coupled systems can accommodate shocks, failures and pressures without destabilizing.  Tightly coupled systems have a higher risk of disastrous failure because their processes are more time-dependent, with invariant sequences and a single way of achieving the production goal, and have little slack. (pp. 89-94)

“Interactions” may be linear or complex.  Linear interactions are between a system component and one or more other components that immediately precede or follow it in the production sequence.  These interactions are familiar and, if something unplanned occurs, the results are easily visible.  Complex interactions are between a system component and one or more other components outside the normal production sequence.  If unfamiliar, unplanned or unexpected sequences occur, the results may not be visible or immediately comprehensible. (pp. 77-78)

Nuclear plants have the tightest coupling and most complex interactions of the two dozen systems Perrow shows on the I/C chart, a population that included chemical plants, space missions and nuclear weapons accidents. (p. 97)

Perrow on Nuclear Power

Let's get one thing out of the way immediately: Normal Accidents is an anti-nuke screed.  Perrow started the book in 1979 and it was published in 1984.  He was motivated to write the book by the TMI accident and it obviously colored his forecast for the industry.  He reviews the TMI accident in detail, then describes nuclear industry characteristics and incidents at other plants, all of which paint an unfavorable portrait of the industry.  He concludes: “We have not had more serious accidents of the scope of Three Mile Island simply because we have not given them enough time to appear.” (p. 60, emphasis added)  While he is concerned with design, construction and operating problems, his primary fear is “the potential for unexpected interactions of small failures in that system that makes it prone to the system accident.” (p. 61)   

Why has his prediction of such serious accidents not come to pass, at least in the U.S.?

Our Perspective on Normal Accidents

We have several issues with this book and the author's “analysis.”

Nuclear is not as complex as Perrow asserts 


There is no question that the U.S. nuclear industry grew quickly, with upsized plants and utilities specifying custom design combinations (in other words, limited standardization).  The utilities were focused on meeting significant load growth forecasts and saw nuclear baseload capacity as an efficient way to produce electric power.  However, actually operating a large nuclear plant was probably more complex than the utilities realized.  But not any more.  Learning curve effects, more detailed procedures and improved analytic methods are a few of the factors that led to a greater knowledge basis for plant decision making.  The serious operational issues at the “problem plants” (circa 1997) forced operators to confront the reality that identifying and permanently resolving plant problems was necessary for survival.  This era also saw the beginning of industry consolidation, with major operators applying best methods throughout their fleets.  All of these changes have led to our view that nuclear plants are certainly complicated but no longer complex and haven't been for some time.    

This is a good place to point out that Perrow's designation of nuclear plants as the most complex and tightest coupled systems he evaluated has no basis in any real science.  In his own words, “The placement of systems [on the interaction/coupling chart] is based entirely on subjective judgments on my part; at present there is no reliable way to measure these two variables, interaction and coupling.” (p. 96)

System failures with incomprehensible consequences are not the primary problem in the nuclear industry

The 1986 Chernobyl disaster was arguably a system failure: poor plant design, personnel non-compliance with rules and a deficient safety culture.  It was a serious accident but not a catastrophe.*** 

But other significant industry events have not arisen from interactions deep within the system; they have come from negligence, hubris, incompetence or selective ignorance.  For example, Fukushima was overwhelmed by a tsunami that was known to be possible but was ignored by the owners.  At Davis-Besse, personnel ignored increasingly stronger signals of a nascent problem but managers argued that in-depth investigation could wait until the next outage (production trumps safety) and the NRC agreed (with no solid justification).  

Important system dynamics are ignored 


Perrow has some recognition of what a system is and how threats can arise within it: “. . . it is the way the parts fit together, interact, that is important.  The dangerous accidents lie in the system, not in the components.” (p. 351)  However, he is/was focused on interactions and couplings as they currently exist.  But a socio-technical system is constantly changing (evolving, learning) in response to internal and external stimuli.  Internal stimuli include management decisions and the reactions to performance feedback signals; external stimuli include environmental demands, constraints, threats and opportunities.  Complacency and normalization of deviance can seep in but systems can also bolster their defenses and become more robust and resilient.****  It would be a stretch to say that nuclear power has always learned from its mistakes (especially if they occur at someone else's plant) but steps have been taken to make operations less complex. 

My own bias is Perrow doesn't really appreciate the technical side of a socio-technical system.  He recounts incidents in great detail, but not at great depth and is often recounting the work of others.  Although he claims the book is about technology (the socio side, aka culture, is never mentioned), the fact remains that he is not an engineer or physicist; he is a sociologist.

Conclusion

Notwithstanding all my carping, this is a significant book.  It is highly readable.  Perrow's discussion of accidents, incidents and issues in various contexts, including petrochemical plants, air transport, marine shipping and space exploration, is fascinating reading.  His interaction/coupling chart is a useful mental model to help grasp relative system complexity although one must be careful about over-inferring from such a simple representation.

There are some useful suggestions, e.g., establishing an anonymous reporting system, similar to the one used in the air transport industry, for nuclear near-misses. (p. 169)  There is a good discussion of decentralization vs centralization in nuclear plant organizations. (pp. 334-5)  But he says that neither is best all the time, which he considers a contradiction.  The possibility of contingency management, i.e., using a decentralized approach for normal times and tightening up during challenging conditions, is regarded as infeasible.

Ultimately, he includes nuclear power with “systems that are hopeless and should be abandoned because the inevitable risks outweigh any reasonable benefits . . .” (p. 304)*****  As further support for this conclusion, he reviews three different ways of evaluating the world: absolute, bounded and social rationality.  Absolute rationality is the province of experts; bounded rationality recognizes resource and cognitive limitations in the search for solutions.  But Perrow favors social rationality (which we might unkindly call crowdsourced opinions) because it is the most democratic and, not coincidentally, he can cite a study that shows an industry's “dread risk” is highly correlated with its position on the I/C chart. (p. 326)  In other words, if lots of people are fearful of nuclear power, no matter how unreasonable those fears are, that is further evidence to shut it down.

The 1999 edition of Normal Accidents has an Afterword that updates the original version.  Perrow continues to condemn nuclear power but without much new data.  Much of his disapprobation is directed at the petrochemical industry.  He highlights writers who have advanced his ideas and also presents his (dis)agreements with high reliability theory and Vaughn's interpretation of the Challenger accident.

You don't need this book in your library but you do need to be aware that it is a foundation stone for the work of many other authors.

 

*  C. Perrow, Normal Accidents: Living with High-Risk Technologies (Princeton Univ. Press, Princeton, NJ: 1999).

**  For example, see Erik Hollnagel, The ETTO Principle: Efficiency-Thoroughness Trade-Off (reviewed here); Woods, Dekker et al, Behind Human Error (reviewed here); and Weick and Sutcliffe, Managing the Unexpected: Resilient Performance in an Age of Uncertainty (reviewed here).  It's ironic that Perrow set out to write a readable book without references to the “sacred texts” (p. 11) but it appears Normal Accidents has become one.

***  Perrow's criteria for catastrophe appear to be: “kill many people, irradiate others, and poison some acres of land.” (p. 348)  While any death is a tragedy, reputable Chernobyl studies report fewer than 100 deaths from radiation and project 4,000 radiation-induced cancers in a population of 600,000 people who were exposed.  The same population is expected to suffer 100,000 cancer deaths from all other causes.  Approximately 40,000 square miles of land was significantly contaminated.  Data from Chernobyl Forum, "Chernobyl's Legacy: Health, Environmental and Socio-Economic Impacts" 2nd rev. ed.  Retrieved Aug. 27, 2013.  Wikipedia, “Chernobyl disaster.”  Retrieved Aug. 27, 2013.

In his 1999 Afterword to Normal Accidents, Perrow mentions Chernobyl in passing and his comments suggest he does not consider it a catastrophe but could have been had the wind blown the radioactive materials over the city of Kiev.

****  A truly complex system can drift into failure (Dekker) or experience incidents from performance excursions outside the safety boundaries (Hollnagel).

*****  It's not just nuclear power, Perrow also supports unilateral nuclear disarmament. (p. 347)

Thursday, August 15, 2013

No Innocent Bystanders

The stake that sticks up gets hammered down.
We recently saw an article* about organizational bystander behavior.  Organizational bystanders are people who sense or believe that something is wrong—a risk is increasing or a hazard is becoming manifest—but they don't force their organization to confront the issue or they only halfheartedly pursue it.**  This is a significant problem in high-hazard activities; it seems that after a serious incident occurs, there is always someone, or even several someones, who knew the incident's causes existed but didn't say anything.  Why don't these people speak up?

The authors describe psychological and organizational factors that encourage bystander behavior.  Psychological factors are rooted in uncertainty, observing the failure of others to act and the expectation that expert or formal authorities will address the problem.  Fear is a big factor: fear of being wrong, fear of being chastised for thinking above one's position or outside one's field of authority, fear of being rejected by the work group even if one's concerns are ultimately shown to be correct or fear of being considered disloyal; in brief, fear of the dominant culture. 

Organizational factors include the processes and constraints the organization uses to filter information and make decisions.  Such factors include limiting acceptable information to that which comports with the organization's basic assumptions, and rigid hierarchical and role structures—all components of the organization's culture.  Other organizational factors, e.g., resource constraints and external forces, apply pressure on the culture.  In one type of worst case, “imposing nonnegotiable performance objectives combined with severe sanctions for failure encourages the violation of rules, reporting distortions, and dangerous, sometimes illegal short-cuts.” (p. 52)  Remember Massey Energy and the Upper Big Branch mine disaster?

The authors provide a list of possible actions to mitigate the likelihood of bystander behavior.  Below we recast some of these actions as desirable organizational (or cultural) attributes.

  • Mechanisms exist for encouraging and expressing dissenting points of view;
  • Management systems balance the need for short-term performance with the need for productive inquiry into potential threats;
  • Approaches exist to follow-up on near-misses and other “weak signals” [an important attribute of high reliability organizations]:
  • Disastrous but low probability events are identified and contingency plans prepared;
  • Performance reviews, self-criticism, and a focus on learning at all levels are required.
Even in such a better world, “bystander behavior is not something that can be 'fixed' once and for all, as it is a natural outgrowth of the interplay of human psychology and organizational forces. The best we can hope for is to manage it well, and, by so doing, help to prevent catastrophic outcomes.” (p.53) 

Our Perspective

This paper presents a useful discussion of the interface between the individual and the organization under problematic conditions, viz., when the individual sees something that may be at odds with the prevailing world view.  It's important to realize that even if the organizational factors are under control, many people will still be reluctant to rock the boat, lo the risk they see is to the boat itself.   

The authors correctly emphasize the important role of leadership in developing the desirable organizational attributes, however, as we have argued elsewhere, leadership can influence, but not unilaterally specify, organizational culture. 

We would like to see more discussion of systemic processes.  For example, the impact of possible negative feedback on the individual is described but positive feedback, such as through the compensation, recognition and reward systems, is not discussed.  Organizational learning (adaptation) is mentioned but not well developed.

The article mentions the importance of independent watchdogs.  We note that in the nuclear industry, the regulator plays an important role in encouraging bystanders to get involved and protecting them if they do.

The article concludes with a section on the desirable contributions of the human resources (HR) department.  It is, quite frankly, unrealistic (it overstates the role and authority of HR in nuclear organizations I have seen) but was probably necessary to get the article published in an HR journal. 


*  M.S. Gerstein and R.B. Shaw, “Organizational Bystanders,” People and Strategy 31, no. 1 (2008), pp. 47-54.  Thanks to Madalina Tronea for publicizing this article on the LinkedIn Nuclear Safety group.  Dr. Tronea is the group's founder/manager.

**  This is a bit different from the classic bystander effect which refers to a situation where the more people present when help is needed, the less likely any one of them is to provide the help, each one expecting others to provide assistance. 

Wednesday, August 7, 2013

Nuclear Industry Scandal in South Korea

As you know, over the past year trouble has been brewing in the South Korean nuclear industry.  A recent New York Times article* provides a good current status report.  The most visible problem is the falsification of test documents for nuclear plant parts.  Executives have been fired, employees of both a testing company and the state-owned entity that inspects parts and validates their safety certificates have been indicted.

It should be no surprise that the underlying causes are rooted in the industry structure and culture.  South Korea has only one nuclear utility, state-owned Korea Electric Power Corporation (Kepco).  Kepco retirees go to work for parts suppliers or invest in them.  Cultural attributes include valuing personal ties over regulations, and school and hometown connections.  Bribery is used as a lubricating agent.

As a consequence,  “In the past 30 years, our nuclear energy industry has become an increasingly closed community that emphasized its specialty in dealing with nuclear materials and yet allowed little oversight and intervention,” the government’s Ministry of Trade, Industry and Energy said in a recent report to lawmakers. “It spawned a litany of corruption, an opaque system and a business practice replete with complacency.”

Couldn't happen here, right?  I hope not, but the U.S. nuclear industry, while not as closed a system as its Korean counterpart, is hardly an open community.  The “unique and special” mantra promotes insular thinking and encourages insiders to view outsiders with suspicion.  The secret practices of the industry's self-regulator do not inspire public confidence.  A familiar cast of NEI/INPO participants at NRC stakeholder meetings fuels concern over the degree to which the NRC has been captured by industry.  Utility business decisions that ultimately killed plants (CR3, Kewaunee, San Onofre) appear to have been made in conference rooms isolated from any informed awareness of worst-case technical/commercial consequences.  Our industry has many positive attributes but some others ask us to stop and reflect.  

*  C. Sang-Hun, “Scandal in South Korea Over Nuclear Revelations,” New York Times (Aug. 3, 2013).  Retrieved Aug. 6, 2013.

Tuesday, July 30, 2013

Introducing NuclearSafetySim

We have referred to NuclearSafetySim and the use of simulation tools on a regular basis in this blog.  NuclearSafetySim is our initiative to develop a new approach to safety management training for nuclear professionals.  It utilizes a simulator to provide a realistic nuclear operations environment within which players are challenged by emergent issues - where they must make decisions balancing safety implications and other priorities - over a five year period.  Each player earns an overall score and is provided with analyses and data on his/her decision making and performance against goals.  It is clearly a different approach to safety culture training, one that attempts to operationalize the values and traits espoused by various industry bodies.  In that regard it is exactly what nuclear professionals must do on a day to day basis. 

At this time we are making NuclearSafetySim available to our readers through a web-based demo version.  To get started you need to access the NuclearSafetySim website.  Click on the Introduction tab at the top of the Home page.  Here you will find a link to a narrated slide show that provides important background on the approach used in the simulation.  It runs about 15 minutes.  Then click on the Simulation tab.  Here you will find another video which is a demo of NuclearSafetySim.  While this runs about 45 minutes (apologies) it does provide a comprehensive tutorial on the sim and how to interact with it.  We urge you to view it.  Finally...at the bottom of the Simulation page is a link to the NuclearSafetySim tool.  Clicking on the link brings you directly to the Home screen and you’re ready to play.

As you will see on the website and in the sim itself, there are reminders and links to facilitate providing feedback on NuclearSafetySim and/or requesting additional information.  This is important to us and we hope our readers will take the time to provide thoughtful input, including constructive criticism.  We welcome all comments. 

Wednesday, July 24, 2013

Leadership, Culture and Organizational Performance

As discussed in our July 18, 2013 post, INPO's position is that creating and maintaining a healthy safety culture (SC) is a primary leadership responsibility.*  That seems like a common sense belief but is it based on any social science?  What is the connection between leader behavior and culture?  And what is the connection between culture and organizational performance? 

To help us address these questions, we turn to a paper** by some Stanford and UC Berkeley academics.  They review the relevant literature and present their own research and findings.  This paper is not a great fit with nuclear power operations but some of the authors' observations and findings are useful.  One might think there would be ample materials on this important topic but “only a very few studies have actually explored the interrelationships among leadership, culture and performance.” (p. 33)

Leaders and Culture


Leaders can be described by different personality types.  Note this does not focus on specific behavior, e.g., how they make decisions, but the attributes of each personality type certainly imply the kinds of behavior that can reasonably be expected.  The authors contend “. . . the myriad of potential personality and value constructs can be reliably captured by five essential personality constructs, the so-called Big Five or the Five Factor Model . . .” (p. 6)  You have all been exposed to the Big 5, or a similar, taxonomy.  An individual may exhibit attributes from more than one type but can be ultimately be classified as primarily representative of one specific type.  The five types are listed below, with a few selected attributes for each.
  • Agreeableness (Cooperative, Compromising, Compassionate, Trusting)
  • Conscientiousness (Orderly, Reliable, Achievement oriented, Self-disciplined, Deliberate, Cautious)
  • Extraversion (Gregarious, Assertive, Energy, Optimistic)
  • Neuroticism (Negative affect, Anxious, Impulsive, Hostile, Insecure)
  • Openness to Experience (Insightful, Challenge convention, Autonomous, Resourceful)

Leaders can affect culture and later we'll see that some personality types are associated with specific types of organizational culture.  “While not definitive, the evidence suggests that personality as manifested in values and behavior is associated with leadership at the CEO level and that these leader attributes may affect the culture of the organization, although the specific form of these relationships is not clear.” (p. 10)  “. . . senior leaders, because of their salience, responsibility, authority and presumed status, have a disproportionate impact on culture, . . .” (p. 11)

Culture and Organizational Performance

Let's begin with a conclusion: “One of the most important yet least understood questions is how organizational culture relates to organizational performance” (p. 11)

To support their research model, the authors describe a framework, similar to the Big 5 for personality, for summarizing organizational cultures.  The Organizational Culture Profile (OCP) features seven types of culture, listed below with a few selected attributes for each. 

  • Adaptability (Willing to experiment, Taking initiative, Risk taking, Innovative)
  • Collaborative (Team-oriented, Cooperative, Supportive, Low levels of conflict)
  • Customer-oriented (Listening to customers, Being market driven)
  • Detail-oriented (Being precise, Emphasizing quality, Being analytical)
  • Integrity (High ethical standards, Being honest)
  • Results-Oriented (High expectations for performance, Achievement oriented, Not easy going)
  • Transparency (Putting the organization’s goals before the unit, Sharing information freely)
The linkage between culture and performance is fuzzy.  “While the strong intuition was that organizational culture should be directly linked to firm effectiveness, the empirical results are equivocal.” (p. 14)  “[T]he association of culture and performance is not straightforward and likely to be contingent on the firm’s strategy, the degree to which the culture promotes adaptability, and how widely shared and strongly felt the culture is.” (p. 17)  “Further compounding the issue is that the relationship between culture and firm performance has been shown to vary across industries.” (p. 11)  Finally, “although the [OCP] has the advantage of identifying a comprehensive set of cultural dimensions, there is no guarantee that any particular dimension will be relevant for a particular firm.” (p. 18)  I think it's fair to summarize the culture-performance literature by saying “It all depends.” 

Research Results

The authors gathered and analyzed data on a group of high-technology firms: CEO personalities based on the Big 5 types, cultural descriptions using the OCP, and performance data.  Firm performance was based on financial metrics, firm reputation (an intangible asset) and employee attitudes.*** (p. 23-24) 

“[T]he results reveal a number of significant relationships between CEO personality and firm culture, . . . CEOs who were more extraverted (gregarious, assertive, active) had cultures that were more results-oriented. . . . CEOs who were more conscientious (orderly, disciplined, achievement-oriented) had cultures that were more detail-oriented . . . CEOs who were higher on openness to experience (ready to challenge convention, imaginative, willing to try new activities) [were] more likely to have cultures that emphasized adaptability. (p. 26)

“Cultures that were rated as more adaptable, results-oriented and detail-oriented were seen more positively by their employees. Firms that emphasized adaptability and were more detail-oriented were also more admired by industry observers.” (p. 28)

In sum, the linkage between leadership and performance is far from clear.  But “consistent patterns of [CEO] behavior shape interpretations of what’s important [values] and how to behave. . . . Other research has shown that a CEO’s personality may affect choices of strategy and structure.” (p. 31)

Relevance to Nuclear Operations


As mentioned in the introduction, this paper is not a great fit with the nuclear industry.  The authors' research focuses on high technologically companies, there is nothing SC-specific and their financial performance metrics (more important to firms in highly competitive industries) are more robust than their non-financial measures.  Safety performance is not mentioned.

But their framework stimulates us to ask important questions.  For example, based on the research results, what type of CNO would you select for a plant with safety performance problems?  How about one facing significant economic challenges?  Or one where things are running smoothly?  Based on the OCP, what types of culture would be most supportive of a strong SC?  Would any types be inconsistent with a strong SC?  How would you categorize your organization's culture?  

The authors suggest that “Senior leaders may want to consider developing the behaviors that cultivate the most useful culture for their firm, even if these behaviors do not come naturally to them.” (p. 35)  Is that desirable or practical for your CNO?

The biggest challenge to obtaining generalizable results, which the authors recognize, is that so many driving factors are situation-specific, i.e., dependent on a firm's industry, competitive position and relative performance.  They also recognize a possible weakness in linear causality, i.e., the leadership → culture → performance logic may not be one-way.  In our systems view, we'd say there are likely feedback loops, two-way influence flows and additional relevant variables in the overall model of the organization.

The linear (Newtonian) viewpoint promoted by INPO suggests that culture is mostly (solely?) created by senior executives.  If only it were that easy.  Such a view “runs counter to the idea that culture is a social construct created by many individuals and their behavioral patterns.” (p. 10)  We believe culture, including SC, is an emergent organizational property created by the integration of top-down activities with organizational history, long-serving employees, and strongly held beliefs and values, including the organization's “real” priorities.  In other words, SC is a result of the functioning over time of the socio-technical system.  In our view, a CNO can heavily influence, but not unilaterally define, organizational culture including SC.



*  As another example of INPO's position, a recent presentation by an INPO staffer ends with an Ed Schein quote: “...the only thing of real importance that leaders do is to create and manage culture...”  The quote is from Schein's Organizational Culture and Leadership (San Francisco, CA: Jossey-Bass, 1985), p. 2.  The presentation was A. Daniels, “How to Continuously Improve Cultural Traits for the Management of Safety,” IAEA International Experts’ Meeting on Human and Organizational Factors in Nuclear Safety in the Light of the Accident at the Fukushima Daiichi Nuclear Power Plant, Vienna May 21-24, 2013.
 

**  C. O’Reilly, D. Caldwell, J. Chatman and B. Doerr, “The Promise and Problems of Organizational Culture: CEO Personality, Culture, and Firm Performance”  Working paper (2012).  Retrieved July 22, 2013.  To enhance readability, in-line citations have been removed from quotes.

***  The authors report “Several studies show that culture is associated with employee attitudes . . . ” (p. 14)

Thursday, July 18, 2013

INPO: Traits of a Healthy Nuclear Safety Culture

The Institute of Nuclear Power Operations (INPO) has released a document* that aims at aligning their previous descriptions of safety culture (SC) with current NRC SC terminology.  The document describes the essential traits and attributes of a healthy** nuclear SC.  “[A] trait is defined as a pattern of thinking, feeling, and behaving such that safety is emphasized over competing priorities. . . . The attributes clarify the intent of the traits.” (p. 3)  While there is an effort to align with NRC, the document remains consistent with INPO policy, viz., SC is a primary leadership responsibility.  Leaders are expected to regularly reinforce SC, measure SC in their organization and communicate what constitutes a healthy SC.

There are ten traits organized into three categories.  Each trait has multiple attributes and each attribute has representative observable behaviors that are supposed to evidence the attribute's existence, scope and strength.  Many of the behaviors stress management's responsibilities.  The report has too much detail to summarize in this post so we'll concentrate on one of the key SC artifacts we have repeatedly emphasized on this blog: decision making.

Decision making (DM) is one of the ten traits.  DM has three attributes: a consistent process, conservative bias and single-point accountability.  Risk insights are incorporated as appropriate.  Observable behaviors include: the organization establishes a well-defined DM process; individuals demonstrate an understanding of the DM process; leaders seek inputs from different work groups or organizations; when previous decisions are called into question by new facts, leaders reevaluate these decisions; conservative assumptions are used when determining whether emergent or unscheduled work can be conducted safely; leaders take a conservative approach to DM, particularly when information is incomplete or conditions are unusual; managers take timely action to address degraded conditions; executives and senior managers reinforce the expectation that the reactor will be shut down when procedurally required, when the margin for safe operation has degraded unacceptably, or when the condition of the reactor is uncertain; individuals do not rationalize assumptions for the sake of completing a task; and the organization ensures that important nuclear safety decisions are made by the correct person at the lowest appropriate level. (pp. 19-20)  That's quite a mouthful but it's not all of the behaviors and some of the included ones have been shortened to fit.

In addition to the above, communicating, explaining, challenging and justifying individual decisions are mentioned throughout the document.  Finally, “Leaders demonstrate a commitment to safety in their decisions and behaviors.” (p. 15)

Our perspective

On the positive side, the INPO treatment of DM is much more comprehensive than what we've seen in the NRC Common Language Path Forward materials released to date.

But the DM example illustrates a major problem with this type of document: a lengthy laundry list of observable behaviors that can morph into de facto requirements.  Now INPO says “. . . this document is not intended to be used as a checklist. It is encouraged that this document be considered for inclusion and use in self-assessments, root cause analyses, and training content, as appropriate.” (p. 3)  But while the observable behaviors may be intended as representative or illustrative, in practice they are likely to become first expectations then requirements.  An overall tone of absolutism reinforces this possibility.

The same tone is evident in the discussion of DM's larger context.  For example, INPO asserts that SC is a board and corporate responsibility but explicit or implicit priorities from above can create constraints on plant management's DM flexibility.  INPO also says “Executives and senior managers ensure sufficient corporate resources are allocated to the nuclear organization for short- and long-term safe and reliable operation” (p. 15) but the top and bottom of the organization may not agree on what level of resources is “sufficient.”

Another problem is the lack of priorities or relative importance.  Are all the traits equally important?  How about the attributes?  And the observable behaviors?  Is it up to, say, a team of QA assessors to determine what they need to include or do they only look at what the boss says or do they try to evaluate everything even remotely related to the scope of their inquiry?

But our biggest difficulty is with this statement: “These traits and attributes, when embraced, will be reflected in the values, assumptions, behaviors, beliefs, and norms of an organization and its members.” (p. 3)  This is naïve absolutism at its worst.  While some members of an organization may incorporate new values, others may comply with the rules and exhibit the desired behavior based on other factors, e.g., fear, peer pressure, desire for recognition or power, or money.  And ultimately, who cares why they do it?  As Commissioner Apostolakis said during an NRC meeting when the proposed SC policy was being discussed: “[W]e really care about what people do and maybe not why they do it. . . .”  (See our Feb. 12, 2011 post.)

We could not say it better ourselves.


*  Institute of Nuclear Power Operations (INPO), “Traits of a Healthy Nuclear Safety Culture”  INPO 12-012, Rev. 1 (April 2013).  The report has two addenda.  One describes nuclear safety behaviors and actions that contribute to a healthy nuclear SC by organizational level and the other provides cross-references to other INPO documents, the NRC ROP cross-cutting area components and the IAEA SC characteristics.  Thanks to Madalina Tronea for making these documents available.

** INPO refers to SC “health” while the NRC refers to SC “strength.”

Saturday, July 6, 2013

Behind Human Error by Woods, Dekker, Cook, Johannesen and Sarter

This book* examines how errors occur in complex socio-technical systems.  The authors' thesis is that behind every ascribed “human error” there is a “second story” of the context (conditions, demands, constraints, etc.) created by the system itself.  “That which we label “human error” after the fact is never the cause of an accident.  Rather, it is the cumulative effect of multiple cognitive, collaborative, and organizational factors.” (p. 35)  In other words, “Error is a symptom indicating the need to investigate the larger operational systems and the organizational context in which it functions.” (p. 28)  This post presents a summary of the book followed by our perspective on its value.  (The book has a lot of content so this will not be a short post.)

The Second Story

This section establishes the authors' view of error and how socio-technical systems function.  They describe two mutually exclusive world views: (1) “erratic people degrade an otherwise safe system” vs. (2) “people create safety at all levels of the socio-technical system by learning and adapting . . .” (p. 6)  It should be obvious that the authors favor option 2.

In such a world “Failure, then, represents breakdowns in adaptations directed at coping with complexity.  Indeed, the enemy of safety is not the human: it is complexity.” (p. 1)  “. . . accidents emerge from the coupling and interdependence of modern systems.” (p. 31) 

Adaptation occurs in response to pressures or environmental changes.  For example, systems are under stakeholder pressure to become faster, better, cheaper; multiple goals and goal conflict are regular complex system characteristics.  But adaptation is not always successful.  There may be too little (rules and procedures are followed even though conditions have changed) or too much (adaptation is attempted with insufficient information to achieve goals).  Because of pressure, adaptations evolve toward performance boundaries, in particular, safety boundaries.  There is a drift toward failure. (see Dekker, reviewed here)

The authors present 15 premises for analyzing errors in complex socio-technical systems. (pp. 19-30)  Most are familiar but some are worth highlighting and remembering when thinking about system errors:

  • “There is a loose coupling between process and outcome.”  A “bad” process does not always produce bad outcomes and a “good” process does not always produce good outcomes.
  • “Knowledge of outcome (hindsight) biases judgments about process.”  More about that later.
  • “Lawful factors govern the types of erroneous actions or assessments to be expected.”   In other words, “errors are regular and predictable consequences of a variety of factors.”
  • “The design of artifacts affects the potential for erroneous actions and paths towards disaster.”  This is Human Factors 101 but problems still arise.  “Increased coupling increases the cognitive demands on practitioners.”  Increased coupling plus weak feedback can create a latent failure.

Complex Systems Failure


This section covers traditional mental models used for assessing failures and points out the putative inadequacies of each.  The sequence-of-events (or domino) model is familiar Newtonian causal analysis.  Man-made disaster theory puts company culture and institutional design at the heart of the safety question.  Vulnerability develops over time but is hidden by the organization’s belief that it has risk under control.  A system or component is driven into failure.  The latent failure (or Swiss cheese) model proposes that “disasters are characterized by a concatenation of several small failures and contributing events. . .” (p. 50)  While a practitioner may be closest to an accident, the associated latent failures were created by system managers, designers, maintainers or regulators.  All these models reinforce the search for human error (someone untrained, inattentive or a “bad apple) and the customary fixes (more training, procedure adherence and personal attention, or targeted discipline).  They represent a failure to adopt systems thinking and concepts of dynamics, learning, adaptation and the notion that a system can produce accidents as a natural consequence of its normal functioning.

A more sophisticated set of models is then discussed.  Perrow's normal accident theory says that “accidents are the structural and virtually inevitable product of systems that are both interactively complex and tightly coupled.” (p. 61)  Such systems structurally confuse operators and prevent them from recovering when incipient failure is discovered.  People are part of the Perrowian system and can exhibit inadequate expertise.  Control theory sees systems as composed of components that must be kept in dynamic equilibrium based on feedback and continual control inputs—basically a system dynamics view.  Accidents are a result of normal system behavior and occur when components interact to violate safety constraints and the feedback (and control inputs) do not reflect the developing problems.  Small changes in the system can lead to huge consequences elsewhere.  Accident avoidance is based on making system performance boundaries explicit and known although the goal of efficiency will tend to push operations toward the boundaries.  In contrast, the authors would argue for a different focus: making the system more resilient, i.e., error-tolerant.**  High reliability theory describes how how-hazard activities can achieve safe performance through leadership, closed systems, functional decentralization, safety culture, redundancy and systematic learning.  High reliability means minimal variations in performance, which in the short-term, means safe performance but HROs are subject to incidents indicative of residual system noise and unseen changes from social forces, information management or new technologies. (See Weick, reviewed here)

Standing on the shoulders of the above sophisticated models, resilience engineering (RE) is proposed as a better way to think about safety.  According to this model, accidents “represent the breakdowns in the adaptations necessary to cope with the real world complexity. (p. 83)  The authors use the Columbia space shuttle disaster to illustrate patterns of failure evident in complex systems: drift toward failure, past success as reason for continued confidence, fragmented problem-solving, ignoring new evidence and intra-organizational communication breakdowns.  To oppose or compensate for these patterns, RE proposes monitoring or enhancing other system properties including: buffering capacity, flexibility, margin and tolerance (which means replacing quick collapse with graceful degradation).  RE “focuses on what sustains or erodes the adaptive capacities of human-technical systems in a changing environment.” (p. 93)  In practice, that means detecting signs of increasing risk, having resources for safety available, and recognizing when and where to invest to offset risk.  It also requires focusing on organizational decision making, e.g., cross checks for risky decisions, the safety-production-efficiency balance and the reporting and disposition of safety concerns.  “Enhancing error tolerance, detection and recovery together produce safety.” (p. 26)

Operating at the Sharp End

An organization's sharp end is where practitioners apply their expertise in an effort to achieve the organization's goals.  The blunt end is where support functions, from administration to engineering, work.  The blunt end designs the system, the sharp end operates it.  Practitioner performance is affected by cognitive activities in three areas: activation of knowledge, the flow of attention and interactions among multiple goals.

The knowledge available to practitioners arrives as organized content.  Challenges include: organization may be poor, the content may be incomplete or simply wrong.  Practitioner mental models may be inaccurate or incomplete without the practitioners realizing it, i.e., they may be poorly calibrated.  Knowledge may be inert, i.e., not accessed when it is needed.  Oversimplifications (heuristics) may work in some situations but produce errors in others and limit the practitioner's ability to account for uncertainties or conflicts that arise in individual cases.  The discussion of heuristics suggests Hollnagel, reviewed here.

Mindset is about attention and its control.” (p. 114)  Attention is a limited resource.  Problems with maintaining effective attention include loss of situational awareness, in which the practitioner's mental model of events doesn't match the real world, and fixation, where the practitioner's initial assessment of  a situation creates a going-forward bias against accepting discrepant data and a failure to trigger relevant inert knowledge.  Mindset seems similar to HRO mindfulness. (see Weick)

Goal conflict can arise from many sources including management policies, regulatory requirements, economic (cost) factors and risk of legal liability.  Decision making must consider goals (which may be implicit), values, costs and risks—which may be uncertain.  Normalization of deviance is a constant threat.  Decision makers may be held responsible for achieving a goal but lack the authority to do so.  The conflict between cost and safety may be subtle or unrecognized.  “Safety is not a concrete entity and the argument that one should always choose the safest path misrepresents the dilemmas that confront the practitioner.” (p. 139)  “[I]t is difficult for many organizations (particularly in regulated industries) to admit that goal conflicts and tradeoff decisions arise.” (p. 139)  Overall, the authors present a good discussion of goal conflict.

How Design Can Induce Error


The design of computerized devices intended to help practitioners can instead lead to greater risks of errors and incidents.  Specific causes of problems include clumsy automation, limited information visibility and mode errors. 

Automation is supposed to increase user effectiveness and efficiency.  However, clumsy automation creates situations where the user loses track of what the computer is set up to do, what it's doing and what it will do next.  If support systems are so flexible that users can't know all their possible configurations, they adopt simplifying strategies which may be inappropriate in some cases.  Clumsy automation leads to more (instead of less) cognitive work, user attention is diverted to the machine instead of the task, increased potential for new kinds of errors and the need for new user knowledge and judgments.  The machine effectively has its own model of the world, based on user inputs, data sensors and internal functioning, and passes that back to the user.

Machines often hide a mass of data behind a narrow keyhole of visibility into the system.  Successful design creates “a visible conceptual space meaningfully related to activities and constraints in a field of practice.” (p. 162)  In addition, “Effective representations highlight  'operationally interesting' changes for sequences of behavior . . .” (p. 167)  However, default displays typically do not make interesting events directly visible.

Mode errors occurs when an operator initiates an action that would be appropriate if the machine were in mode A but, in fact, it's in mode B.  (This may be a man-machine problem but it's not the machine's fault.)  A machine can change modes based on situational and system factors in addition to operator input.  Operators have to maintain mode awareness, not an easy task when viewing a small, cluttered display that may not highlight current mode or mode changes.

To cope with bad design “practitioners adapt information technology provided for them to the immediate tasks at hand in a locally pragmatic way, . . .” (p. 191)  They use system tailoring where they adapt the device, often by focusing on a feature set they consider useful and ignoring other machine capabilities.  They use task tailoring where they adapt strategies to accommodate constraints imposed by the new technology.  Both types of adaptation can lead to success or eventual failures. 

The authors suggest various countermeasures and design changes to address these problems. 

Reactions to Failure

Different approaches for analyzing accidents lead to different perspectives on human error. 

Hindsight bias is “the tendency for people to 'consistently exaggerate what could have been anticipated in foresight.'” (p. 15)  It reinforces the tendency to look for the human in the human error.  Operators are blamed for bad outcomes because they are available, tracking back to multiple contributing causes is difficult, most system performance is good and investigators tend to judge process quality by its outcome.  Outsiders tend to think operators knew more about their situation than they actually did.  Evaluating process instead of outcome is also problematic.  Process and outcome are loosely coupled and what standards should be used for process evaluation?  Formal work descriptions “underestimate the dilemmas, interactions between constraints, goal conflicts, and tradeoffs present in the actual workplace.” (p. 208)  A suggested alternative approach is to ask what other practitioners would have done in the same situation and build a set of contrast cases.  “What we should not do, . . . is rely on putatively objective external evaluations . . . such as . . . court cases or other formal hearings.  Such processes in fact institutionalize and legitimate the hindsight bias . . . leading to blame and a focus on individual actors at the expense of a system view.” (pp. 213-214)

Distancing through differencing is another risk.  In this practice, reviewers focus on differences between the context surrounding an accident and their own circumstance.  Blaming individuals reinforces belief that there are no lessons to be learned for other organizations.  If human error is local and individual (as opposed to systemic) then sanctions, exhortations to follow the procedures and remedial training are sufficient fixes.  There is a decent discussion of TMI here, where, in the authors' opinion, the initial sense of fundamental surprise and need for socio-technical fixes was soon replaced by a search for local, technologically-focused solutions.
      
There is often pressure to hold people accountable after incidents or accidents.  One answer is a “just culture” which views incidents as system learning opportunities but also draws a line between acceptable and unacceptable behavior.  Since the “line” is an attribution the key question for any organization is who gets to draw it.  Another challenge is defining the discretionary space where individuals alone have the authority to decide how to proceed.  There is more on just culture but this is all (or mostly) Dekker. (see our Just Culture commentary here)

The authors' recommendations for analyzing errors and improving safety can be summed up as follows: recognize that human error is an attribution; pursue second stories that reveal the multiple, systemic contributors to failure; avoid hindsight bias; understand how work really gets done; search for systemic vulnerabilities; study how practice creates safety; search for underlying patterns; examine how change will produce new vulnerabilities; use technology to enhance human expertise; and tame complexity. (p. 239)  “Safety is created at the sharp end as practitioners interact with hazardous processes . . . using the available tools and resources.” (p. 243)

Our Perspective

This is a book about organizational characteristics and socio-technical systems.  Recommendations and advice are aimed at organizational policy makers and incident investigators.  The discussion of a “just culture” is the only time culture is discussed in detail although safety culture is mentioned in passing in the HRO write-up.

Our first problem with the book is repeatedly referring to medicine, aviation, aircraft carrier operations and nuclear power plants as complex systems.***  Although medicine is definitely complex and aviation (including air traffic control) possibly is, carrier operations and nuclear power plants are simply complicated.  While carrier and nuclear personnel have to make some adaptations on the fly, they do not face sudden, disruptive changes in their technologies or operating environments and they are not exposed to cutthroat competition.  Their operations are tightly coordinated but, where possible, by design more loosely coupled to facilitate recovery if operations start to go sour.  In addition, calling nuclear power operations complex perpetuates the myth that nuclear is “unique and special” and thus merits some special place in the pantheon of industry.  It isn't and it doesn't.

Our second problem relates to the authors' recasting of the nature of human error.  We decry the rush to judgment after negative events, particularly a search limited to identifying culpable humans.  The search for bad apples or outright criminals satisfies society's perceived need to bring someone to justice and the corporate system's desire to appear to fix things through management exhortations and training without really admitting systemic problems or changing anything substantive, e.g., the management incentive plan.  The authors' plea for more systemic analysis is thus welcome.

But they push the pendulum too far in the opposite direction.  They appear to advocate replacing all human errors (except for gross negligence, willful violations or sabotage) with systemic explanations, aka rationalizations.  What is never mentioned is that medical errors lead to tens of thousands of preventable deaths per year.****  In contrast, U.S. commercial aviation has not experienced over a hundred fatalities (excluding 9/11) since 1996; carriers and nuclear power plants experience accidents, but there are few fatalities.  At worst, this book is a denial that real human errors (including bad decisions, slip ups, impairments, coverups) occur and a rationalization of medical mistakes caused by arrogance, incompetence, class structure and lack of accountability.

This is a dense book, 250 pages of small print, with an index that is nearly useless.  Pressures (most likely cost and schedule) have apparently pushed publishing to the system boundary for copy editing—there are extra, missing and wrong words throughout the text.

This 2010 second edition updates the original 1994 monograph.  Many of the original ideas have been fleshed out elsewhere by the authors (primarily Dekker) and others.  Some references, e.g., Hollnagel, Perrow and the HRO school, should be read in their original form. 


*  D.D. Woods, S. Dekker, R. Cook, L. Johannesen and N. Sarter, Behind Human Error, 2d ed.  (Ashgate, Burlington, VT: 2010).  Thanks to Bill Mullins for bringing this book to our attention.

**  There is considerable overlap of the perspectives of the authors and the control theorists (Leveson and Rasmussen are cited in the book).  As an aside, Dekker was a dissertation advisor for one of Leveson's MIT students.

***  The authors' different backgrounds contribute to this mash-up.  Cook is a physician, Dekker is a pilot and some of Woods' cited publications refer to nuclear power (and aviation).

****  M. Makary, “How to Stop Hospitals From Killing Us,” Wall Street Journal online (Sept. 21, 2012).  Retrieved July 4, 2013.