Tuesday, October 29, 2013

NRC Outreach on the Safety Culture Policy Statement

An NRC public meeting
Last August 7th, the NRC held a public meeting to discuss their outreach initiatives to inform stakeholders about the Safety Culture Policy Statement (SCPS).  A meeting summary was published in October.*  Much of the discussion covered what we'll call the bureaucratization of safety culture (SC)—development of communication materials (inc. a poster, brochure, case studies and website), presentations at conferences and meetings (some international), and training.  However, there were some interesting tidbits, discussed below.

One NRC presentation covered the SC Common Language Initiative.**  The presenter remarked that an additional SC trait, Decision Making (DM), was added during the development of the common language.  In our Feb. 28, 2013 review of the final common language document, we praised the treatment of DM; it is a principal creator of artifacts that reflect an organization's SC.

The INPO presentation noted that “After the common language effort was completed in January, 2013, INPO published Revision 1 of 12-012, which includes all of the examples developed during the common language workshop.” (p. 6)  We reviewed that INPO document here.

But here's the item that got our attention.  During a presentation on NRC outreach, an industry participant cautioned the NRC to not put policy statements into regulatory documents because policy statements are an expectation, not a regulation. The senior NRC person at the meeting agreed with the comment “and the importance of the NRC not overstepping the Commission’s direction that implementing the SCPS is not a regulatory requirement, but rather the Commission’s expectations.” (p. 4)

We find the last comment disingenuous.  We have previously posted on how the NRC has created de facto regulation of SC.***  In the absence of clear de jure regulation, licensees and the NRC end up playing “bring me another rock” until the NRC accepts a licensee's pronouncements, as verified by NRC inspectors.  For an example of this convoluted kabuki, read Bob Cudlin's Jan. 30, 2013 post on how Palisades' efforts to address a plant incident finally gained NRC acceptance, or at least an NRC opinion that Palisades' SC was “adequate and improving.”

We'll keep you posted on SCPS-related activities.


*  D.J. Sieracki to R.P. Zimmerman, “Summary of the August 7, 2013, Public Meeting between the U.S. Nuclear Regulatory Commission Staff  and Stakeholders to  Exchange Information and Discuss Ongoing Education and Outreach Associated with the Safety Culture Policy Statement” (Oct. 1, 2013).  ADAMS ML13267A385.  We continue to find it ironic that the SCPS is administered by the NRC's Office of Enforcement.  Isn't OE's primary focus on people and companies who violate the NRC's regulations?

**  “The common language initiative uses the traits from the SCPS as a basic foundation, and contains definitions and examples to describe each trait more fully.” (p. 3)

***  For related posts, please click the "Regulation of Safety Culture" label.

Friday, October 18, 2013

When Apples Decay

In our experience education is perceived as a continual process, accumulating knowledge progressively over time.  A shiny apple exemplifies the learning student or an inspiring insight (see Newton, Sir Isaac). Less consideration is given to the fact that the educational process can work in reverse leading to a loss of capability over time.  In other words, the apple decays.  As Martin Weller notes on his blog The Ed Techie, “education is about selling apples...we need to recognise and facilitate learning that takes ten minutes or involves extended participation in a community over a number of years.”*

This leads us to a recent Wall Street Journal piece, “Americans Need a Simple Retirement System”.**  The article is about the failure of educational efforts to improve financial literacy.  We admit that this is a bit out of context for nuclear safety culture; nevertheless it provides a useful perspective that seems to be overlooked in within the nuclear industry.  The article notes:

“The problem is that, like all educational efforts, financial education decays over time and has negligible effects on behavior after 20 months. The authors suggest that, given this decay, “just in time” financial education...might be a more effective way to proceed.”

We tend to view the safety culture training provided at nuclear plants to be of the 10 minute variety, selling apples that may vary in size and color but are just apples.   Additional training is routinely prescribed in response to findings of inadequate safety culture.  Yet we cannot recall a single reported instance where safety culture issues were associated with inadequate or ineffective training in the first place.  Nor do we see explicit recognition that such training efforts have very limited half lives, creating cycles of future problems.  We have blogged about the decay of training based reinforcement (see our March 22, 2010 post) and the contribution of decay and training saturation to complacency (see our Dec. 8, 2011 post).

The fact that safety culture knowledge and “strength” decays over time is just one example of the dynamics associated with safety management.  Arguably one could assert that an effective learning process itself is a (the?) key to building and maintaining strong safety culture.  And further that it is consistently missing in current nuclear industry programs that emphasize indoctrination in traits and values.  It’s time for better and more innovative approaches - not just more apples.



*  M. Weller, "The long-awaited 'education as fruit' metaphor," The Ed Techie blog (Sept. 10, 2009).  Retrieved Oct. 18, 2013.

**  A.H. Munnell, "Americans need a simple retirement system," MarketWatch blog (Oct. 16, 2013).  Retrieved Oct. 18, 2013.

Monday, October 14, 2013

High Reliability Management by Roe and Schulman

This book* presents a multi-year case study of the California Independent System Operator (CAISO), the government entity created to operate California's electricity grid when the state deregulated its electricity market.  CAISO's travails read like The Perils of Pauline but our primary interest lies in the authors' observations of the different grid management strategies CAISO used under various operating conditions; it is a comprehensive description of contingency management in the real world.  In this post we summarize the authors' management model, discuss the application to nuclear management and opine on the implications for nuclear safety culture.

The High Reliability Management (HRM) Model

The authors call the model they developed High Reliability Management and present it in a 2x2 matrix where the axes are System Volatility and Network Options Variety. (Ch. 3)  System Volatility refers to the magnitude and rate of change of  CAISO's environmental variables including generator and transmission availability, reserves, electricity prices, contracts, the extent to which providers are playing fair or gaming the system, weather, temperature and electricity demand (regional and overall).  Network Options Variety refers to the range of resources and strategies available for meeting demand (basically in real time) given the current inputs. 

System Volatility and Network Options Variety can each be High or Low so there are four possible modes and a distinctive operating management approach for each.  All modes must address CAISO's two missions of matching electricity supply and demand, and protecting the grid.  Operators must manage the system inside an acceptable or tolerable performance bandwidth (invariant output performance is a practical impossibility) in all modes.  Operating conditions are challenging: supply and demand are inherently unstable (p. 34), inadequate supply means some load cannot be served and too much generation can damage the grid. (pp. 27, 142)

High Volatility and High Options mean both generation (supply) and demand are changing quickly and the operators have multiple strategies available for maintaining balance.  Some strategies can be substituted for others.  It is a dynamic but manageable environment.

High Volatility and Low Options mean both generation and demand are changing quickly but the operators have few strategies available for maintaining balance.  They run from pillar to post; it is highly stressful.  Sometimes they have to create ad hoc (undocumented and perhaps untried) approaches using trail and error.  Demand can be satisfied but regulatory limits may be exceeded and the system is running closer to the edge of technical capabilities and operator skills.  It is the most unstable performance mode and untenable because the operators are losing control and one perturbation can amplify into another. (p. 37)

Low Volatility and Low Options mean generation and demand are not changing quickly.  The critical feature here is demand has been reduced by load shedding.  The operators have exhausted all other strategies for maintaining balance.  It is a command-and-control approach, effected by declaring a  Stage 3 grid situation and run using formal rules and procedures.  It is the least desirable domain because one primary mission, to meet all demand, is not being accomplished. 

Low Volatility and High Options is an HRM's preferred mode.  Actual demand follows the forecast, generators are producing as expected, reserves are on hand, and there is no congestion on transmission lines or backup routes are available.  Procedures based on analyzed conditions exist and are used.  There are few, if any, surprises.  Learning can occur but it is incremental, the result of new methods or analysis.  Performance is important and system behavior operates within a narrow bandwidth.  Loss of attention (complacency) is a risk.  Is this starting to sound familiar?  This is the domain of High Reliability Organization (HRO) theory and practice.  Nuclear power operations is an example of an HRO. (pp. 60-62)          

Lessons for Nuclear Operations 


Nuclear plants work hard to stay in the Low Volatility/High Options mode.  If they stray into the Low Options column, they run the risks of facing unanalyzed situations and regulatory non-compliance. (p. 62)  In their effort to optimize performance in the desired mode, plants examine their performance risks to ever finer granularity through new methods and analyses.  Because of the organizations' narrow focus, few resources are directed at identifying, contemplating and planning for very low probability events (the tails of distributions) that might force a plant into a different mode or have enormous potential negative consequences.**  Design changes (especially new technologies) that increase output or efficiency may mask subtle warning signs of problems; organizations must be mindful to performance drift and nascent problems.   

In an HRO, trial and error is not an acceptable method for trying out new options.  No one wants cowboy operators in the control room.  But examining new options using off-line methods, in particular simulation, is highly desirable. (pp. 111, 233)  In addition, building reactive capacity in the organization can be a substitute for foresight to accommodate the unexpected and unanalyzed. (pp. 116-17)  

The focus on the external changes that buffeted CAISO leads to a shortcoming when looking for lessons for nuclear.  The book emphasizes CAISO's adaptability to new environmental demands, requirements and constraints but does not adequately recognize the natural evolution of the system.  In nuclear, it's natural evolution that may quietly lead to performance drift and normalization of deviance.  In a similar vein, CAISO has to worry about complacency in just one mode, for nuclear it's effectively the only mode and complacency is an omnipresent threat. (p. 126) 

The risk of cognitive overload occurs more often for CAISO operators but it has visible precursors; for nuclear operators the risk is overload might occur suddenly and with little or no warning.*** Anticipation and resilience are more obvious needs at CAISO but also necessary in nuclear operations. (pp. 5, 124)

Implications for Safety Culture

Both HRMs and HROs need cultures that value continuous training, open communications, team players able to adjust authority relationships when facing emergent issues, personal responsibility for safety (i.e., safety does not inhere in technology), ongoing learning to do things better and reduce inherent hazards, rewards for achieving safety and penalties for compromising it, and an overall discipline dedicated to failure-free performance. (pp. 198, App. 2)  Both organizational types need a focus on operations as the central activity.  Nuclear is good at this, certainly better than CAISO where entities outside of operations promulgated system changes and the operators were stuck with making them work.

The willingness to report errors should be encouraged but we have seen that is a thin spot in the SC at some plants.  Errors can be a gateway into learning how to create more reliable performance and error tolerance vs. intolerance is a critical cultural issue. (pp. 111-12, 220) 

The simultaneous needs to operate within a prescribed envelope while considering how the envelope might be breached has implications for SC.  We have argued before that a nuclear organization is well-served by having a diversity of opinions and some people who don't subscribe to group think and instead keep asking “What's the worst case scenario and how would we manage it to an acceptable conclusion?” 

Conclusion

This review gives short shrift to the authors' broad and deep description and analysis of CAISO.****  The reason is that the major takeaway for CAISO, viz., the need to recognize mode shifts and switch management strategies accordingly as the manifestation of “normal” operations, is not really applicable to day-to-day nuclear operations.

The book describes a rare breed, the socio-technical-political start-up, and has too much scope for the average nuclear practitioner to plow through searching for newfound nuggets that can be applied to nuclear management.  But it's a good read and full of insightful observations, e.g., the description of  CAISO's early days (ca. 2001-2004) when system changes driven by engineers, politicians and regulators, coupled with changing challenges from market participants, prevented the organization from settling in and effectively created a negative learning curve with operators reporting less confidence in their ability to manage the grid and accomplish the mission in 2004 vs. 2001. (Ch. 5)

(High Reliability Management was recommended by a Safetymatters reader.  If you have a suggestion for material you would like to see promoted and reviewed, please contact us.)

*  E. Roe and P. Schulman, High Reliability Management (Stanford Univ. Press, Stanford, CA: 2008)  This book reports the authors' study of CAISO from 2001 through 2006. 

**  By their nature as baseload generating units, usually with long-term sales contracts, nuclear plants are unlikely to face a highly volatile business environment.  Their political and social environment is similar: The NRC buffers them from direct interference by politicians although activists prodding state and regional authorities, e.g., water quality boards, can cause distractions and disruptions.

The importance of considering low-probability, major consequence events is argued by Taleb (see here) and Dédale (see here).

***  Over the course of the authors' investigation, technical and management changes at CAISO intended to make operations more reliable often had the unintended effect of moving the edge of the prescribed performance envelope closer to the operators' cognitive and skill capacity limits. 

The Cynefin model describes how organizational decision making can suddenly slip from the Simple domain to the Chaotic domain via the Complacent zone.  For more on Cynefin, see here and here.

****  For instance, ch. 4 presents a good discussion of the inadequate or incomplete applicability of Normal Accident Theory (Perrow, see here) or High Reliability Organization theory (Weick, see here) to the behavior the authors observed at CAISO.  As an example, tight coupling (a threat according to NAT) can be used as a strength when operators need to stitch together an ad hoc solution to meet demand. (p. 135)

Ch. 11 presents a detailed regression analysis linking volatility in selected inputs to volatility in output, measured by the periods when electricity made available (compared to demand) fell outside regulatory limits.  This analysis illustrated how well CAISO's operators were able to manage in different modes and how close they were coming to the edge of their ability to control the system, in other words, performance as precursor to the need to go to Stage 3 command-and-control load shedding.