Monday, October 14, 2013

High Reliability Management by Roe and Schulman

This book* presents a multi-year case study of the California Independent System Operator (CAISO), the government entity created to operate California's electricity grid when the state deregulated its electricity market.  CAISO's travails read like The Perils of Pauline but our primary interest lies in the authors' observations of the different grid management strategies CAISO used under various operating conditions; it is a comprehensive description of contingency management in the real world.  In this post we summarize the authors' management model, discuss the application to nuclear management and opine on the implications for nuclear safety culture.

The High Reliability Management (HRM) Model

The authors call the model they developed High Reliability Management and present it in a 2x2 matrix where the axes are System Volatility and Network Options Variety. (Ch. 3)  System Volatility refers to the magnitude and rate of change of  CAISO's environmental variables including generator and transmission availability, reserves, electricity prices, contracts, the extent to which providers are playing fair or gaming the system, weather, temperature and electricity demand (regional and overall).  Network Options Variety refers to the range of resources and strategies available for meeting demand (basically in real time) given the current inputs. 

System Volatility and Network Options Variety can each be High or Low so there are four possible modes and a distinctive operating management approach for each.  All modes must address CAISO's two missions of matching electricity supply and demand, and protecting the grid.  Operators must manage the system inside an acceptable or tolerable performance bandwidth (invariant output performance is a practical impossibility) in all modes.  Operating conditions are challenging: supply and demand are inherently unstable (p. 34), inadequate supply means some load cannot be served and too much generation can damage the grid. (pp. 27, 142)

High Volatility and High Options mean both generation (supply) and demand are changing quickly and the operators have multiple strategies available for maintaining balance.  Some strategies can be substituted for others.  It is a dynamic but manageable environment.

High Volatility and Low Options mean both generation and demand are changing quickly but the operators have few strategies available for maintaining balance.  They run from pillar to post; it is highly stressful.  Sometimes they have to create ad hoc (undocumented and perhaps untried) approaches using trail and error.  Demand can be satisfied but regulatory limits may be exceeded and the system is running closer to the edge of technical capabilities and operator skills.  It is the most unstable performance mode and untenable because the operators are losing control and one perturbation can amplify into another. (p. 37)

Low Volatility and Low Options mean generation and demand are not changing quickly.  The critical feature here is demand has been reduced by load shedding.  The operators have exhausted all other strategies for maintaining balance.  It is a command-and-control approach, effected by declaring a  Stage 3 grid situation and run using formal rules and procedures.  It is the least desirable domain because one primary mission, to meet all demand, is not being accomplished. 

Low Volatility and High Options is an HRM's preferred mode.  Actual demand follows the forecast, generators are producing as expected, reserves are on hand, and there is no congestion on transmission lines or backup routes are available.  Procedures based on analyzed conditions exist and are used.  There are few, if any, surprises.  Learning can occur but it is incremental, the result of new methods or analysis.  Performance is important and system behavior operates within a narrow bandwidth.  Loss of attention (complacency) is a risk.  Is this starting to sound familiar?  This is the domain of High Reliability Organization (HRO) theory and practice.  Nuclear power operations is an example of an HRO. (pp. 60-62)          

Lessons for Nuclear Operations 


Nuclear plants work hard to stay in the Low Volatility/High Options mode.  If they stray into the Low Options column, they run the risks of facing unanalyzed situations and regulatory non-compliance. (p. 62)  In their effort to optimize performance in the desired mode, plants examine their performance risks to ever finer granularity through new methods and analyses.  Because of the organizations' narrow focus, few resources are directed at identifying, contemplating and planning for very low probability events (the tails of distributions) that might force a plant into a different mode or have enormous potential negative consequences.**  Design changes (especially new technologies) that increase output or efficiency may mask subtle warning signs of problems; organizations must be mindful to performance drift and nascent problems.   

In an HRO, trial and error is not an acceptable method for trying out new options.  No one wants cowboy operators in the control room.  But examining new options using off-line methods, in particular simulation, is highly desirable. (pp. 111, 233)  In addition, building reactive capacity in the organization can be a substitute for foresight to accommodate the unexpected and unanalyzed. (pp. 116-17)  

The focus on the external changes that buffeted CAISO leads to a shortcoming when looking for lessons for nuclear.  The book emphasizes CAISO's adaptability to new environmental demands, requirements and constraints but does not adequately recognize the natural evolution of the system.  In nuclear, it's natural evolution that may quietly lead to performance drift and normalization of deviance.  In a similar vein, CAISO has to worry about complacency in just one mode, for nuclear it's effectively the only mode and complacency is an omnipresent threat. (p. 126) 

The risk of cognitive overload occurs more often for CAISO operators but it has visible precursors; for nuclear operators the risk is overload might occur suddenly and with little or no warning.*** Anticipation and resilience are more obvious needs at CAISO but also necessary in nuclear operations. (pp. 5, 124)

Implications for Safety Culture

Both HRMs and HROs need cultures that value continuous training, open communications, team players able to adjust authority relationships when facing emergent issues, personal responsibility for safety (i.e., safety does not inhere in technology), ongoing learning to do things better and reduce inherent hazards, rewards for achieving safety and penalties for compromising it, and an overall discipline dedicated to failure-free performance. (pp. 198, App. 2)  Both organizational types need a focus on operations as the central activity.  Nuclear is good at this, certainly better than CAISO where entities outside of operations promulgated system changes and the operators were stuck with making them work.

The willingness to report errors should be encouraged but we have seen that is a thin spot in the SC at some plants.  Errors can be a gateway into learning how to create more reliable performance and error tolerance vs. intolerance is a critical cultural issue. (pp. 111-12, 220) 

The simultaneous needs to operate within a prescribed envelope while considering how the envelope might be breached has implications for SC.  We have argued before that a nuclear organization is well-served by having a diversity of opinions and some people who don't subscribe to group think and instead keep asking “What's the worst case scenario and how would we manage it to an acceptable conclusion?” 

Conclusion

This review gives short shrift to the authors' broad and deep description and analysis of CAISO.****  The reason is that the major takeaway for CAISO, viz., the need to recognize mode shifts and switch management strategies accordingly as the manifestation of “normal” operations, is not really applicable to day-to-day nuclear operations.

The book describes a rare breed, the socio-technical-political start-up, and has too much scope for the average nuclear practitioner to plow through searching for newfound nuggets that can be applied to nuclear management.  But it's a good read and full of insightful observations, e.g., the description of  CAISO's early days (ca. 2001-2004) when system changes driven by engineers, politicians and regulators, coupled with changing challenges from market participants, prevented the organization from settling in and effectively created a negative learning curve with operators reporting less confidence in their ability to manage the grid and accomplish the mission in 2004 vs. 2001. (Ch. 5)

(High Reliability Management was recommended by a Safetymatters reader.  If you have a suggestion for material you would like to see promoted and reviewed, please contact us.)

*  E. Roe and P. Schulman, High Reliability Management (Stanford Univ. Press, Stanford, CA: 2008)  This book reports the authors' study of CAISO from 2001 through 2006. 

**  By their nature as baseload generating units, usually with long-term sales contracts, nuclear plants are unlikely to face a highly volatile business environment.  Their political and social environment is similar: The NRC buffers them from direct interference by politicians although activists prodding state and regional authorities, e.g., water quality boards, can cause distractions and disruptions.

The importance of considering low-probability, major consequence events is argued by Taleb (see here) and Dédale (see here).

***  Over the course of the authors' investigation, technical and management changes at CAISO intended to make operations more reliable often had the unintended effect of moving the edge of the prescribed performance envelope closer to the operators' cognitive and skill capacity limits. 

The Cynefin model describes how organizational decision making can suddenly slip from the Simple domain to the Chaotic domain via the Complacent zone.  For more on Cynefin, see here and here.

****  For instance, ch. 4 presents a good discussion of the inadequate or incomplete applicability of Normal Accident Theory (Perrow, see here) or High Reliability Organization theory (Weick, see here) to the behavior the authors observed at CAISO.  As an example, tight coupling (a threat according to NAT) can be used as a strength when operators need to stitch together an ad hoc solution to meet demand. (p. 135)

Ch. 11 presents a detailed regression analysis linking volatility in selected inputs to volatility in output, measured by the periods when electricity made available (compared to demand) fell outside regulatory limits.  This analysis illustrated how well CAISO's operators were able to manage in different modes and how close they were coming to the edge of their ability to control the system, in other words, performance as precursor to the need to go to Stage 3 command-and-control load shedding.

Friday, September 27, 2013

Four Years of Safetymatters

Aztec Calendar
Over the four plus years we have been publishing this blog, regular readers will have noticed some recurring themes in our posts.  The purpose of this post is to summarize our perspective on these key themes.  We have attempted to build a body of work that is useful and insightful for you.

Systems View

We have consistently considered safety culture (SC) in the nuclear industry to be one component of a complicated socio-technical system.  A systems view provides a powerful mental model for analyzing and understanding organizational behavior. 

Our design and explicative efforts began with system dynamics as described by authors such as Peter Senge, focusing on characteristics such as feedback loops and time delays that can affect system behavior and lead to unexpected, non-linear changes in system performance.  Later, we expanded our discussion to incorporate the ways systems adapt and evolve over time in response to internal and external pressures.  Because they evolve, socio-technical organizations are learning organizations but continuous improvement is not guaranteed; in fact, evolution in response to pressure can lead to poorer performance.

The systems view, system dynamics and their application through computer simulation techniques are incorporated in the NuclearSafetySim management training tool.

Decision Making

A critical, defining activity of any organization is decision making.  Decision making determines what will (or will not) be done, by whom, and with what priority and resources.  Decision making is  directed and constrained by factors including laws, regulations, policies, goals, procedures and resource availability.  In addition, decision making is imbued with and reflective of the organization's values, mental models and aspirations, i.e., its culture, including safety culture.

Decision making is intimately related to an organization's financial compensation and incentive program.  We've commented on these programs in nuclear and non-nuclear organizations and identified the performance goals for which executives received the largest rewards; often, these were not safety goals.

Decision making is part of the behavior exhibited by senior managers.  We expect leaders to model desired behavior and are disappointed when they don't.  We have provided examples of good and bad decisions and leader behavior. 

Safety Culture Assessment


We have cited NRC Commissioner Apostolakis' observation that “we really care about what people do and maybe not why they do it . . .”  We sympathize with that view.  If organizations are making correct decisions and getting acceptable performance, the “why” is not immediately important.  However, in the longer run, trying to identify the why is essential, both to preserve organizational effectiveness and to provide a management (and mental) model that can be transported elsewhere in a fleet or industry.

What is not useful, and possibly even a disservice, is a feckless organizational SC “analysis” that focuses on a laundry list of attributes or limits remedial actions to retraining, closer oversight and selective punishment.  Such approaches ignore systemic factors and cannot provide long-term successful solutions.

We have always been skeptical of the value of SC surveys.  Over time, we saw that others shared our view.  Currently, broad-scope, in-depth interviews and focus groups are recognized as preferred ways to attempt to gauge an organization's SC and we generally support such approaches.

On a related topic, we were skeptical of the NRC's SC initiatives, which culminated in the SC Policy Statement.  As we have seen, this “policy” has led to back door de facto regulation of SC.

References and Examples

We've identified a library of references related to SC.  We review the work of leading organizational thinkers, social scientists and management writers, attempt to accurately summarize their work and add value by relating it to our views on SC.  We've reported on the contributions of Dekker, Dörner, Hollnagel, Kahneman, Perin, Perrow, Reason, Schein, Taleb, Vaughan, Weick and others.

We've also posted on the travails of organizations that dug themselves into holes that brought their SC into question.  Some of these were relatively small potatoes, e.g., Vermont Yankee and EdF, but others were actual disasters, e.g., Massey Energy and BP.  We've also covered DOE, especially the Hanford Waste Treatment and Immobilization Plant (aka the Vit plant).

Conclusion

We believe the nuclear industry is generally well-managed by well-intentioned personnel but can be affected by the natural organizational ailments of complacency, normalization of deviation, drift, hubris, incompetence and occasional criminality.  Our perspective has evolved as we have learned more about organizations in general and SC in particular.  Channeling John Maynard Keynes, we adapt our models when we become aware of new facts or better ways of looking at the data.  We hope you continue to follow Safetymatters.  

Tuesday, September 24, 2013

Safety Paradigm Shift

We came across a provocative and persuasive presentation by Jean Pariès Dédale, "Why a Paradigm Shift Is Needed" from the IAEA Experts Meeting in May of this year.*  Many of the points resonate with our views on nuclear safety management; in particular complexity, the fallacy of the "predetermination envelope"- making a system more reliable within its design envelope but more susceptible outside that envelope; deterministic and probabilistic rationalization that avoids dealing with complexity of the system; and unknown-unknowns.  We also believe it will take a paradigm shift, however unlikely it may be at least in the U.S. nuclear industry.  Interestingly, Dédale does not appear to have a nuclear power background and develops his paradigm argument across multiple events and industries.

Dédale poses a very fundamental question: since the current safety construct has shown vulnerabilities to actual off-normal events should the response be, do more of the same but better and with more rigor? Or should the safety paradigm itself be challenged?  The key issue underlying the challenge to this construct is how to cope with complexity.  He means complexity in the same terms we have posted about numerous times.

Dédale notes “The uncertainty generated by the complexity of the system itself and by its environment is skirted through deterministic or probabilistic rationality.” (p. 8)  We agree.  Any review of condition reports and Tech Spec variances indicates a wholesale reliance on risk based rationale for deviations from nominal requirements.  And the risk based argument is almost always based on an estimated small probability of an event that would challenge safety, often enhanced by a relatively short exposure time frame.  As we highlighted in a prior post, Nick Taleb has long cautioned against making decisions based on assessments of probabilities, which he asserts we cannot know, versus consequences which are (sometimes uncomfortably) knowable.

How does this relate to safety management issues including culture?

We see a parallel between constructs for nuclear safety and safety culture.  The nuclear safety construct is constrained both in focus and evolution, heavily reliant on the design basis philosophy (what Dédale labels “predetermination fallacy”) dating back to the 1960s.  Little has changed over the succeeding 50 years; even the advent of PRA has been limited to “informing” the implementation of this approach.  Safety culture has emerged over the last 10+ years as an added regulatory emphasis though highly constrained in its manifestation as a policy statement.  (It is in fact still quite difficult to square the NRC’s characterization of safety culture as critical to safety** yet stopping way short of any regulation or requirements.)  The definitional scope of safety culture is expressed in a set of traits and related values and behaviors.  As with nuclear safety it has a limited scope and relies on abstractions emphasizing, in essence, individual morality.  It does not look beyond people to the larger environment and “system” within which people function.  This environment can bring to bear significant influences that can challenge the desired traits and values of safety culture policy and muddle their application to decisions and actions.  The limitations can be seen in the assessments of safety culture (surveys and similar) as well as the investigation of specific events, violations or non-conformances by licensees and the NRC.  We’ve read many of these and rarely have we encountered any probing of the “why” associated with perceived breakdowns in safety culture.

One exception and a very powerful case in point is contained in our post dated July 29, 2010.  The cited reference is an internal root cause analysis performed by FPL to address employee concerns and identified weaknesses in their corrective action program.  They cite production pressures as negatively impacting employee trust and recognition, and perceptions of management and operational decisions.  FPL took steps to change the origin and impact of production pressures, relieving some of the burden on the organization to contain those influences within the boundaries of safe operation.

Perhaps the NRC believes that it does not have the jurisdiction to probe these types of issues or even require licensees to assess their influence.  Yet the NRC routinely refers to “licensee burden” - cost, schedule, production impacts - in accepting deviations from nominal safety standards.****  We wonder if a broader view of safety culture in the context of the socio-technical system might better “inform” both regulatory policy and decisions and enhance safety management.


*  J.P. Dédale, "Why a Paradigm Shift Is Needed," IAEA International Experts’ Meeting on Human and Organizational Factors in Nuclear Safety in the Light of the Accident at the Fukushima Daiichi Nuclear Power Plant, Vienna May 21-24, 2013.


**  The NRC’s Information Notice 2013-15 states that safety culture is “essential to nuclear safety in all phases…”
 

***  "NRC Decision on FPL (Part 2)," Safetymatters (July 29, 2010).  See slide 18, Root Cause 2 and Contributing Causes 2.2 and 2.4. 

****  10 CFR 50.55a(g)(6)(i) states that the Commission may grant such relief and may impose such alternative requirements as it determines is authorized by law and will not endanger life or property or the common defense and security and is otherwise in the public interest, given the consideration of the burden upon the licensee (emphasis added).

Tuesday, September 17, 2013

Even Macy’s Does It

We have long been proponents of looking for innovative ways to improve safety management training for nuclear professionals.  We’ve taken the burden to develop a prototype management simulator, NuclearSafetySim, and made it available to our readers to experience for themselves (see our July 30, 2013 post).  In the past we have also noted other industries and organizations that have embraced simulation as an effective management training tool.

An August article in the Wall Street Journal* cites several examples of new approaches to manager training.  Most notable in our view is Macy’s use of simulations to have managers gain decision making experience.  As the article states:

“The simulation programs aim to teach managers how their daily decisions can affect the business as a whole.”

We won’t revisit all the arguments that we’ve made for taking a systems view of safety management, focusing on decisions as the essence of safety culture and using simulation to allow personnel to actualize safety values and priorities.  All of these could only enrich, challenge and stimulate training activities. 

A Clockwork Magenta

 
On the other hand what is the value of training approaches that reiterate INPO slide shows, regulatory policy statements and good practices in seemingly endless iterations?  Brings to mind the character Alex, the incorrigible sociopath in A Clockwork Orange with an unusual passion for classical music.**  He is the subject of “reclamation treatment”, head clamped in a brace and eyes pinned wide open, forced to watch repetitive screenings of anti-social behavior to the music of Beethoven’s Fifth.  We are led to believe this results in a “cure” but does it and at what cost?

Nuclear managers may not be treated exactly like Alex but there are some similarities.  After plant problems occur and are diagnosed, managers are also declared “cured” after each forced feeding of traits, values, and the need for increased procedure adherence and oversight.  Results still not satisfactory?  Repeat.



*  R. Feintzeig, "Building Middle-Manager Morale," Wall Street Journal (Aug. 7, 2013).  Retrieved Sept. 24, 2013.

**  M. Amis, "The Shock of the New:‘A Clockwork Orange’ at 50,"  New York Times Sunday Book Review (Aug. 31, 2013).  Retrieved Sept. 24, 2013.

Thursday, September 12, 2013

Bad Eggs?

We’ve often thought that intentional or willful violations of safety/regulatory requirements could provide a useful window into the dynamics of safety culture.  Now the NRC has just issued an Information Notice* listing recent instances of willful violations.  The Notice is titled “Willful Misconduct/Record Falsification and Nuclear Safety Culture” and reports on seven recent instances of such conduct.  From the title and throughout the notice the NRC asserts a link between willful violations and nuclear safety culture.  To wit it states, “An effective safety-culture is essential to nuclear safety at all phases of design, construction and operation and can help prevent willful misconduct by ensuring expectations and consequences are clearly stated and understood.” (p. 5)  The NRC adds, “The above willful misconduct issues and discussion highlights the need... to establish and implement an effective nuclear safety-culture.  This includes training, adequate oversight, and frequent communications especially for workers new to the nuclear industry.” (p. 6)

What we see here is consistent with the NRC’s pro forma approach to organizational safety performance issues.  The problem is culture; the answer is more training, more clarity of expectations, more oversight.**  Oh, and disciplinary actions for the errant individuals.  

Are we to take from this that the individuals involved in these situations are just “bad eggs”?  And the answer is some punishment and re-education?  Is this even consistent with the nature of willful violations and does the sheer number of recent experiences raise more fundamental questions, the most basic of which is “Why?”

Let’s start with what is different about willful violations.  Willful violations are deliberate, intentional and knowing.  In other words the individual knows his/her actions are against established policies or procedures.  This is not a case of carelessness or lack of knowledge of what is expected.  Thus it is hard to understand what would be achieved by more training and reinforcement of expectations.  The prescription for more oversight is also puzzling.  It appears to assume that violations will continue unless there is strict monitoring of behaviors.  Interestingly it is reliance on more oversight by managers who apparently weren’t providing the necessary oversight in the first place.

So on the one hand the corrective actions identified in the these events do not appear well suited to the nature of a willful violation.  Perhaps more importantly this treatment of the problem obscures deeper analysis of why such violations are occurring in the first place.  Why are personnel deciding to intentionally do something wrong?  Often willful acts have their basis in personal gain or covering up some other misdeed.  Nothing in the seven instances in the Notice even hint at this type of motivation.  Could it be an intent to do harm to the organization due to some other personal issue - a problem with a supervisor, being passed by for a promotion, etc?  Hmmm, I guess it’s possible but again there does not appear to be any hint of this in the available documentation. Or could it be that the individuals were responding to some actual or perceived pressure to get something done - more quickly, at less cost, or to avoid raising an issue that itself would cost time or money?  Again there was no exploration of motive for these violations in the NRC’s or licensee’s investigations.***

The apparent failure to fully investigate the motive for these violations is unfortunate as it leaves other critical factors unexplored and untreated.  Goal pressures almost always have their origin higher up in the organization.  Defaulting to reinforcing the culture side of the equation may not be effective due to the inherent contradiction in signals from upper management. 

In a prior post we suggested that safety culture be thought of as a “pressure boundary”, specifically “the willingness and ability of an organization to resist undue pressure on safety from competing business priorities”.   When resistance breaks down it can lead to shading of safety assessments, a decided lack of rigor in pursuing causes and extent of condition - or it can even lead to willful violations.  Relieving business pressure may be the far more effective antidote.


*  NRC Information Notice 2013-15: Willful Misconduct/Record Falsification and Nuclear Safety Culture (Aug. 23, 2013).  ADAMS ML13142A437.

**  In two instances modest civil penalties were also assessed.

***  We would remind our readers of our post dated April 2, 2012 regarding the guilty plea of one of the Massey coal mine supervisors to intentional violations of the law.  The stated reason: following the law would decrease coal production.

Thursday, August 29, 2013

Normal Accidents by Charles Perrow

This book*, originally published in 1984, is a regular reference for authors writing about complex socio-technical systems.**  Perrow's model for classifying such systems is intuitively appealing; it appears to reflect the reality of complexity without forcing the reader to digest a deliberately abstruse academic construct.  We will briefly describe the model then spend most of our space discussing our problems with Perrow's inferences and assertions, focusing on nuclear power.  

The Model

The model is a 2x2 matrix with axes of coupling and interactions.  Not surprisingly, it is called the Interaction/Coupling (IC) chart.

“Coupling” refers to the amount of slack, buffer or give between two items in a system.  Loosely coupled systems can accommodate shocks, failures and pressures without destabilizing.  Tightly coupled systems have a higher risk of disastrous failure because their processes are more time-dependent, with invariant sequences and a single way of achieving the production goal, and have little slack. (pp. 89-94)

“Interactions” may be linear or complex.  Linear interactions are between a system component and one or more other components that immediately precede or follow it in the production sequence.  These interactions are familiar and, if something unplanned occurs, the results are easily visible.  Complex interactions are between a system component and one or more other components outside the normal production sequence.  If unfamiliar, unplanned or unexpected sequences occur, the results may not be visible or immediately comprehensible. (pp. 77-78)

Nuclear plants have the tightest coupling and most complex interactions of the two dozen systems Perrow shows on the I/C chart, a population that included chemical plants, space missions and nuclear weapons accidents. (p. 97)

Perrow on Nuclear Power

Let's get one thing out of the way immediately: Normal Accidents is an anti-nuke screed.  Perrow started the book in 1979 and it was published in 1984.  He was motivated to write the book by the TMI accident and it obviously colored his forecast for the industry.  He reviews the TMI accident in detail, then describes nuclear industry characteristics and incidents at other plants, all of which paint an unfavorable portrait of the industry.  He concludes: “We have not had more serious accidents of the scope of Three Mile Island simply because we have not given them enough time to appear.” (p. 60, emphasis added)  While he is concerned with design, construction and operating problems, his primary fear is “the potential for unexpected interactions of small failures in that system that makes it prone to the system accident.” (p. 61)   

Why has his prediction of such serious accidents not come to pass, at least in the U.S.?

Our Perspective on Normal Accidents

We have several issues with this book and the author's “analysis.”

Nuclear is not as complex as Perrow asserts 


There is no question that the U.S. nuclear industry grew quickly, with upsized plants and utilities specifying custom design combinations (in other words, limited standardization).  The utilities were focused on meeting significant load growth forecasts and saw nuclear baseload capacity as an efficient way to produce electric power.  However, actually operating a large nuclear plant was probably more complex than the utilities realized.  But not any more.  Learning curve effects, more detailed procedures and improved analytic methods are a few of the factors that led to a greater knowledge basis for plant decision making.  The serious operational issues at the “problem plants” (circa 1997) forced operators to confront the reality that identifying and permanently resolving plant problems was necessary for survival.  This era also saw the beginning of industry consolidation, with major operators applying best methods throughout their fleets.  All of these changes have led to our view that nuclear plants are certainly complicated but no longer complex and haven't been for some time.    

This is a good place to point out that Perrow's designation of nuclear plants as the most complex and tightest coupled systems he evaluated has no basis in any real science.  In his own words, “The placement of systems [on the interaction/coupling chart] is based entirely on subjective judgments on my part; at present there is no reliable way to measure these two variables, interaction and coupling.” (p. 96)

System failures with incomprehensible consequences are not the primary problem in the nuclear industry

The 1986 Chernobyl disaster was arguably a system failure: poor plant design, personnel non-compliance with rules and a deficient safety culture.  It was a serious accident but not a catastrophe.*** 

But other significant industry events have not arisen from interactions deep within the system; they have come from negligence, hubris, incompetence or selective ignorance.  For example, Fukushima was overwhelmed by a tsunami that was known to be possible but was ignored by the owners.  At Davis-Besse, personnel ignored increasingly stronger signals of a nascent problem but managers argued that in-depth investigation could wait until the next outage (production trumps safety) and the NRC agreed (with no solid justification).  

Important system dynamics are ignored 


Perrow has some recognition of what a system is and how threats can arise within it: “. . . it is the way the parts fit together, interact, that is important.  The dangerous accidents lie in the system, not in the components.” (p. 351)  However, he is/was focused on interactions and couplings as they currently exist.  But a socio-technical system is constantly changing (evolving, learning) in response to internal and external stimuli.  Internal stimuli include management decisions and the reactions to performance feedback signals; external stimuli include environmental demands, constraints, threats and opportunities.  Complacency and normalization of deviance can seep in but systems can also bolster their defenses and become more robust and resilient.****  It would be a stretch to say that nuclear power has always learned from its mistakes (especially if they occur at someone else's plant) but steps have been taken to make operations less complex. 

My own bias is Perrow doesn't really appreciate the technical side of a socio-technical system.  He recounts incidents in great detail, but not at great depth and is often recounting the work of others.  Although he claims the book is about technology (the socio side, aka culture, is never mentioned), the fact remains that he is not an engineer or physicist; he is a sociologist.

Conclusion

Notwithstanding all my carping, this is a significant book.  It is highly readable.  Perrow's discussion of accidents, incidents and issues in various contexts, including petrochemical plants, air transport, marine shipping and space exploration, is fascinating reading.  His interaction/coupling chart is a useful mental model to help grasp relative system complexity although one must be careful about over-inferring from such a simple representation.

There are some useful suggestions, e.g., establishing an anonymous reporting system, similar to the one used in the air transport industry, for nuclear near-misses. (p. 169)  There is a good discussion of decentralization vs centralization in nuclear plant organizations. (pp. 334-5)  But he says that neither is best all the time, which he considers a contradiction.  The possibility of contingency management, i.e., using a decentralized approach for normal times and tightening up during challenging conditions, is regarded as infeasible.

Ultimately, he includes nuclear power with “systems that are hopeless and should be abandoned because the inevitable risks outweigh any reasonable benefits . . .” (p. 304)*****  As further support for this conclusion, he reviews three different ways of evaluating the world: absolute, bounded and social rationality.  Absolute rationality is the province of experts; bounded rationality recognizes resource and cognitive limitations in the search for solutions.  But Perrow favors social rationality (which we might unkindly call crowdsourced opinions) because it is the most democratic and, not coincidentally, he can cite a study that shows an industry's “dread risk” is highly correlated with its position on the I/C chart. (p. 326)  In other words, if lots of people are fearful of nuclear power, no matter how unreasonable those fears are, that is further evidence to shut it down.

The 1999 edition of Normal Accidents has an Afterword that updates the original version.  Perrow continues to condemn nuclear power but without much new data.  Much of his disapprobation is directed at the petrochemical industry.  He highlights writers who have advanced his ideas and also presents his (dis)agreements with high reliability theory and Vaughn's interpretation of the Challenger accident.

You don't need this book in your library but you do need to be aware that it is a foundation stone for the work of many other authors.

 

*  C. Perrow, Normal Accidents: Living with High-Risk Technologies (Princeton Univ. Press, Princeton, NJ: 1999).

**  For example, see Erik Hollnagel, The ETTO Principle: Efficiency-Thoroughness Trade-Off (reviewed here); Woods, Dekker et al, Behind Human Error (reviewed here); and Weick and Sutcliffe, Managing the Unexpected: Resilient Performance in an Age of Uncertainty (reviewed here).  It's ironic that Perrow set out to write a readable book without references to the “sacred texts” (p. 11) but it appears Normal Accidents has become one.

***  Perrow's criteria for catastrophe appear to be: “kill many people, irradiate others, and poison some acres of land.” (p. 348)  While any death is a tragedy, reputable Chernobyl studies report fewer than 100 deaths from radiation and project 4,000 radiation-induced cancers in a population of 600,000 people who were exposed.  The same population is expected to suffer 100,000 cancer deaths from all other causes.  Approximately 40,000 square miles of land was significantly contaminated.  Data from Chernobyl Forum, "Chernobyl's Legacy: Health, Environmental and Socio-Economic Impacts" 2nd rev. ed.  Retrieved Aug. 27, 2013.  Wikipedia, “Chernobyl disaster.”  Retrieved Aug. 27, 2013.

In his 1999 Afterword to Normal Accidents, Perrow mentions Chernobyl in passing and his comments suggest he does not consider it a catastrophe but could have been had the wind blown the radioactive materials over the city of Kiev.

****  A truly complex system can drift into failure (Dekker) or experience incidents from performance excursions outside the safety boundaries (Hollnagel).

*****  It's not just nuclear power, Perrow also supports unilateral nuclear disarmament. (p. 347)

Thursday, August 15, 2013

No Innocent Bystanders

The stake that sticks up gets hammered down.
We recently saw an article* about organizational bystander behavior.  Organizational bystanders are people who sense or believe that something is wrong—a risk is increasing or a hazard is becoming manifest—but they don't force their organization to confront the issue or they only halfheartedly pursue it.**  This is a significant problem in high-hazard activities; it seems that after a serious incident occurs, there is always someone, or even several someones, who knew the incident's causes existed but didn't say anything.  Why don't these people speak up?

The authors describe psychological and organizational factors that encourage bystander behavior.  Psychological factors are rooted in uncertainty, observing the failure of others to act and the expectation that expert or formal authorities will address the problem.  Fear is a big factor: fear of being wrong, fear of being chastised for thinking above one's position or outside one's field of authority, fear of being rejected by the work group even if one's concerns are ultimately shown to be correct or fear of being considered disloyal; in brief, fear of the dominant culture. 

Organizational factors include the processes and constraints the organization uses to filter information and make decisions.  Such factors include limiting acceptable information to that which comports with the organization's basic assumptions, and rigid hierarchical and role structures—all components of the organization's culture.  Other organizational factors, e.g., resource constraints and external forces, apply pressure on the culture.  In one type of worst case, “imposing nonnegotiable performance objectives combined with severe sanctions for failure encourages the violation of rules, reporting distortions, and dangerous, sometimes illegal short-cuts.” (p. 52)  Remember Massey Energy and the Upper Big Branch mine disaster?

The authors provide a list of possible actions to mitigate the likelihood of bystander behavior.  Below we recast some of these actions as desirable organizational (or cultural) attributes.

  • Mechanisms exist for encouraging and expressing dissenting points of view;
  • Management systems balance the need for short-term performance with the need for productive inquiry into potential threats;
  • Approaches exist to follow-up on near-misses and other “weak signals” [an important attribute of high reliability organizations]:
  • Disastrous but low probability events are identified and contingency plans prepared;
  • Performance reviews, self-criticism, and a focus on learning at all levels are required.
Even in such a better world, “bystander behavior is not something that can be 'fixed' once and for all, as it is a natural outgrowth of the interplay of human psychology and organizational forces. The best we can hope for is to manage it well, and, by so doing, help to prevent catastrophic outcomes.” (p.53) 

Our Perspective

This paper presents a useful discussion of the interface between the individual and the organization under problematic conditions, viz., when the individual sees something that may be at odds with the prevailing world view.  It's important to realize that even if the organizational factors are under control, many people will still be reluctant to rock the boat, lo the risk they see is to the boat itself.   

The authors correctly emphasize the important role of leadership in developing the desirable organizational attributes, however, as we have argued elsewhere, leadership can influence, but not unilaterally specify, organizational culture. 

We would like to see more discussion of systemic processes.  For example, the impact of possible negative feedback on the individual is described but positive feedback, such as through the compensation, recognition and reward systems, is not discussed.  Organizational learning (adaptation) is mentioned but not well developed.

The article mentions the importance of independent watchdogs.  We note that in the nuclear industry, the regulator plays an important role in encouraging bystanders to get involved and protecting them if they do.

The article concludes with a section on the desirable contributions of the human resources (HR) department.  It is, quite frankly, unrealistic (it overstates the role and authority of HR in nuclear organizations I have seen) but was probably necessary to get the article published in an HR journal. 


*  M.S. Gerstein and R.B. Shaw, “Organizational Bystanders,” People and Strategy 31, no. 1 (2008), pp. 47-54.  Thanks to Madalina Tronea for publicizing this article on the LinkedIn Nuclear Safety group.  Dr. Tronea is the group's founder/manager.

**  This is a bit different from the classic bystander effect which refers to a situation where the more people present when help is needed, the less likely any one of them is to provide the help, each one expecting others to provide assistance. 

Wednesday, August 7, 2013

Nuclear Industry Scandal in South Korea

As you know, over the past year trouble has been brewing in the South Korean nuclear industry.  A recent New York Times article* provides a good current status report.  The most visible problem is the falsification of test documents for nuclear plant parts.  Executives have been fired, employees of both a testing company and the state-owned entity that inspects parts and validates their safety certificates have been indicted.

It should be no surprise that the underlying causes are rooted in the industry structure and culture.  South Korea has only one nuclear utility, state-owned Korea Electric Power Corporation (Kepco).  Kepco retirees go to work for parts suppliers or invest in them.  Cultural attributes include valuing personal ties over regulations, and school and hometown connections.  Bribery is used as a lubricating agent.

As a consequence,  “In the past 30 years, our nuclear energy industry has become an increasingly closed community that emphasized its specialty in dealing with nuclear materials and yet allowed little oversight and intervention,” the government’s Ministry of Trade, Industry and Energy said in a recent report to lawmakers. “It spawned a litany of corruption, an opaque system and a business practice replete with complacency.”

Couldn't happen here, right?  I hope not, but the U.S. nuclear industry, while not as closed a system as its Korean counterpart, is hardly an open community.  The “unique and special” mantra promotes insular thinking and encourages insiders to view outsiders with suspicion.  The secret practices of the industry's self-regulator do not inspire public confidence.  A familiar cast of NEI/INPO participants at NRC stakeholder meetings fuels concern over the degree to which the NRC has been captured by industry.  Utility business decisions that ultimately killed plants (CR3, Kewaunee, San Onofre) appear to have been made in conference rooms isolated from any informed awareness of worst-case technical/commercial consequences.  Our industry has many positive attributes but some others ask us to stop and reflect.  

*  C. Sang-Hun, “Scandal in South Korea Over Nuclear Revelations,” New York Times (Aug. 3, 2013).  Retrieved Aug. 6, 2013.