Monday, October 20, 2014

DNFSB Hearings on Safety Culture, Round Three

DNFSB Headquarters

On October 7, 2014 the Defense Nuclear Facilities Safety Board (DNFSB) held its third and final hearing* on safety culture (SC) at Department of Energy (DOE) nuclear facilities.  The original focus was on the Hanford Waste Treatment Plant (WTP) but this hearing also discussed the Waste Isolation Pilot Plant (WIPP), the Pantex plant and other facilities.  There were three presenters: DOE Secretary Moniz and two of his top lieutenants.  A newspaper article** published the same day reported key points made during the hearing and you should read that article along with this post.  This post focuses on items not included in the newspaper article, including the tone of the hearing and other nuances.  The presenters used no slides and the hearing transcript has not yet been released.  The only current record of the hearing is a DNFSB video.

Secretary Moniz

Moniz has been Secretary for about a year-and-a-half.  In his view, the keys to improving SC are training, consistent senior management attention, and procurement modifications, i.e., DOE’s intent to revise RFP and contracting processes to include SC expectations.  He also said fostering the consideration of SC in all decisions, including resource allocation, is important.  Board member Sullivan asked about the SC issues at Pantex and Moniz provided a generic answer about improving self-assessments and sharing lessons learned but ultimately punted to the next presenter, Ms. Creedon.

Principal Deputy Administrator Creedon, National Nuclear Security Administration (NNSA)

Creedon has been in her position for two months.  She believes NNSA employees get the job done in spite of bureaucracy but they need greater trust in senior management who, in turn, must work harder to engage the workforce.  Returning to the Pantex*** issues, Sullivan asked why the recommendations of the plant’s outside technical advisors had been ignored for years.  Creedon said she would work to improve communications up and down the organization.  In a separate exchange, she provided an example of positive reinforcement where NNSA employees can receive cash awards ($500) for good work. 

Creedon’s  prior position was in the Department of Defense.  To the extent she has the warfighter mentality (“Anything, anywhere, anytime…at any cost”)**** then balancing mission and safety may not be natural for her.  Her response to a question on this topic was not encouraging; she claimed the motto du jour for NNSA (“Mission First, People Always”) adequately addresses safety's prioity but it obviously doesn’t even mention safety.

Acting Assistant Secretary for Environmental Management Whitney

Whitney is also new in his job but not to DOE, coming from DOE Oak Ridge.  He laid out his goals of establishing trust, a questioning attitude and mutual respect.  He was asked about a SC assessment finding that DOE senior managers don’t feel responsible for safety, rather it belongs to the site leads or one of the EM mission support units.  Whitney said that was unacceptable and described the intent to add SC factors to senior management evaluations.  He also repeated the plan to upgrade the WTP contractor evaluation to include SC factors.  He noted that most employees stay at one site for their entire career, making it hard to transfer SC from site to site.

Our Perspective

The overall tone of the hearing was collegial.  The Board expressed support and encouragement for the presenters, all of whom are relatively new in their jobs.  The presenters all stayed on message and reinforced each other.  For example, for WTP one message is “We know there are still significant SC issues at WTP but we have the right team in place and are taking action and making progress.  Changing a decades-old culture takes time.”  Whitney received more of a (polite) grilling probably because the WTP and the WIPP are under his purview.

We are totally supportive of DOE’s stated intent to add SC factors to contracts and senior management evaluations.  When players have skin in the game, the chances of seeing desired behavioral changes are greatly increased.  We are equally supportive of Secretary Moniz’ desire to create a culture that incorporates safety considerations in all decisions.

DOE is trying to make its employees more conscious of safety’s importance; two thousand mangers have gone through SC training and there’s more to come.  Now we’re starting to worry about the drumbeat of SC creating a Weltanschauung where a strong SC is sine quo non for good outcomes and a weak SC is always present when bad outcomes occur.  Organizational reality is more complicated.  An organization with a mediocre SC can achieve satisfactory results if other effective controls and incentives are in place; an organization with a strong SC can still make poor decisions.  And luck can run good or bad for anyone.

*  DNFSB Oct. 7, 2014 Safety Culture Public Meeting and Hearing.  We posted on the first hearing on June 9, 2014 and the second hearing on Sept. 4, 2014.

**  A. Cary, “Moniz says safety culture at Hanford vit plant led to problems,” Tri-City Herald (Oct. 7, 2014).

***  NNSA's responsibilities include Pantex which has recognized SC issues.

****  See the third footnote in our Sept. 4, 2014 post.

Monday, October 13, 2014

Systems Thinking in Air Traffic Management

A recent white paper* presents ten principles to consider when thinking about a complex socio-technical system, specifically European Air Traffic Management (ATM).  We review the principles below, highlighting aspects that might provide some insights for nuclear power plant operations and safety culture (SC).

Before we start, we should note that ATM is truly a complex** system.  Decisions involving safety and efficiency occur on a continuous basis.  There is always some difference between work-as-imagined and work-as-done.

In contrast, we have argued that a nuclear plant is a complicated system but it has some elements of complexity.  To the extent complexity exists, treating nuclear like a complicated machine via “analysing components using reductionist methods; identifying ‘root causes’ of problems or events; thinking in a linear and short-term way; . . . [or] making changes at the component level” is inadequate. (p. 5)  In other words, systemic factors may contribute to observed performance variability and frustrate efforts to achieve the goal in nuclear of eliminating all differences between work-as-planned and work-as-done.

Principles 1-3 relate to the view of people within systems – our view from the outside and their view from the inside.

1. Field Expert Involvement
“To understand work-as-done and improve how things really work, involve those who do the work.” (p. 8)
2. Local Rationality
“People do things that make sense to them given their goals, understanding of the situation and focus of attention at that time.” (p. 10)
3. Just Culture
“Adopt a mindset of openness, trust and fairness. Understand actions in context, and adopt systems language that is non-judgmental and non-blaming.” (p. 12)

Nuclear is pretty good at getting line personnel involved.  Adages such as “Operations owns the plant” are useful to the extent they are true.  Cross-functional teams can include operators or maintenance personnel.  An effective CAP that allows workers to identify and report problems with equipment, procedures, etc. is good; an evaluation and resolution process that involves members from the same class of workers is even better.  Having someone involved in an incident or near-miss go around to the tailgates and classes to share the lessons learned can be convincing.

But when something unexpected or bad happens, nuclear tends to spend too much time looking for the malfunctioning component (usually human).   “The assumption is that if the person would try harder, pay closer attention, do exactly what was prescribed, then things would go well. . . . [But a] focus on components becomes less effective with increasing system complexity and interactivity.” (p. 4)  An outside-in approach ignores the context in which the human performed, the information and time available, the competition for focus of attention, the physical conditions of the work, fatigue, etc.  Instead of insight into system nuances, the result is often limited to more training, supervision or discipline.

The notion of a “just culture” comes from James Reason.  It’s a culture where employees are not punished for their actions, omissions or decisions that are commensurate with their experience and training, but where gross negligence, willful violations and destructive acts are not tolerated.

Principles 4 and 5 relate to the system conditions and context that affect work.

4. Demand and Pressure
“Demands and pressures relating to efficiency and capacity have a fundamental effect on performance.” (p. 14)
5. Resources & Constraints

“Success depends on adequate resources and appropriate constraints.” (p. 16)

Fluctuating demand creates far more varied and unpredictable problems for ATM than it does in nuclear.  However, in nuclear the potential for goal conflicts between production, cost and safety is always present.  The problem arises from acting as if these conflicts don’t exist.

ATM has to “cope with variable demand and variable resources,” a situation that is also different from nuclear with its base load plants and established resource budgets.  The authors opine that for ATM, “a rigid regulatory environment destroys the capacity to adapt constantly to the environment.” (p. 2) Most of us think of nuclear as quite constrained by procedures, rules, policies, regulations, etc., but an important lesson from Fukushima was that under unforeseen conditions, the organization must be able to adapt according to local, knowledge-based decisions  Even the NRC recognizes that “flexibility may be necessary when responding to off-normal conditions.”***

Principles 6 through 10 concern the nature of system behavior, with 9 and 10 more concerned with system outcomes.  These do not have specific implications for SC other than keeping an open mind and being alert to systemic issues, e.g., complacency, drift or emergent behavior.

6. Interactions and Flows
“Understand system performance in the context of the flows of activities and functions, as well as the interactions that comprise these flows.” (p. 18)
7. Trade-Offs
“People have to apply trade-offs in order to resolve goal conflicts and to cope with the complexity of the system and the uncertainty of the environment.” (p. 20)
8. Performance variability
“Understand the variability of system conditions and behaviour.  Identify wanted and unwanted variability in light of the system’s need and tolerance for variability.” (p. 22)
9. Emergence
“System behaviour in complex systems is often emergent; it cannot be reduced to the behaviour of components and is often not as expected.” (p. 24)
10. Equivalence
“Success and failure come from the same source – ordinary work.” (p. 26)

Work flow certainly varies in ATM but is relatively well-understood in nuclear.  There’s really not much more to say on that topic.

Trade-offs occur in decision making in any context where more than one goal exists.  One useful mental model for conceptualizing trade-offs is Hollnagel’s efficiency-thoroughness construct, basically doing things quickly (to meet the production and cost goals) vs. doing things well (to meet the quality and possibly safety goals).  We reviewed his work on Jan. 3, 2013.

Performance variability occurs in all systems, including nuclear, but the outcomes are usually successful because a system has a certain range of tolerance and a certain capacity for resilience.  Performance drift happens slowly, and can be difficult to identify from the inside.  Dekker’s work speaks to this and we reviewed it on Dec. 5, 2012.

Nuclear is not fully complex but surprises do happen, some of them not caused by component failure.  Emergence (problems that arise from new or unforeseen system interactions) is more likely to occur following the implementation of new technical systems.  We discussed this possibility in a July 6, 2013 post on a book by Woods, Dekker et al.

Equivalence means that work that results in both good and bad outcomes starts out the same way, with people (saboteurs excepted) trying to be successful.  When bad things happen, we should cast a wide net in looking for different factors, including systemic ones, that aligned (like Swiss cheese slices) in the subject case.

The white paper also includes several real and hypothetical case studies illustrating the application of the principles to understanding safety performance challenges 

Our Perspective 

The authors draw on a familiar cast of characters, including Dekker, Hollnagel, Leveson and Reason.  We have posted about all these folks, just click on their label in the right hand column.

The principles are intended to help us form a more insightful mental model of a system under consideration, one that includes non-linear cause and effect relationships, and the possibility of emergent behavior.  The white paper is not a “must read” but may stimulate useful thinking about the nature of the nuclear operating organization.

*  European Organisation for the Safety of Air Navigation(EUROCONTROL), “Systems Thinking for Safety: Ten Principles” (Aug. 2014).  Thanks to Bill Mullins for bringing this white paper to our attention.

**  “[C]omplex systems involve large numbers of interacting elements and are typically highly dynamic and constantly changing with changes in conditions. Their cause-effect relations are non-linear; small changes can produce disproportionately large effects. Effects usually have multiple causes, though causes may not be traceable and are socially constructed.” (pp. 4-5)

Also see our Oct. 14, 2013 discussion of the California Independent System Operator for another example of a complex system.

***  “Work Processes,” NRC Safety Culture Trait Talk, no. 2 (July 2014), p. 1.  ADAMS ML14203A391.  Retrieved Oct. 8, 2014

Sunday, October 5, 2014

Update on INPO Safety Culture Study

On October 22, 2010 we reported on an INPO study that correlated safety culture (SC) survey data with safety performance measures.  A more complete version of the analysis was published in an academic journal* this year and this post expands on our previous comments.

Summary of the Paper

The new paper begins with a brief description of SC and related research.  Earlier research suggests that some modest relationship exists between SC and safety performance but the studies were limited in scope.  Longitudinal (time-based) studies have yielded mixed results.  Overall, this leaves plenty of room for new research efforts.

According to the authors, “The current study provides a unique contribution to the safety culture literature by examining the relationship between safety culture and a diverse set of performance measures [NRC industry trends, ROP data and allegations, and INPO plant data] that focus on the overall operational safety of a nuclear power plant.” (p. 39)  They hypothesized small to medium correlations between current SC survey data and eleven then-current (2010) and future (2011) safety performance measures.**

The 110-item survey instrument was distributed across the U.S. nuclear industry and 2876 useable responses were received from employees and contractors representing almost all U.S. plants.  Principal components analysis (PCA) was applied to the survey data and resulted in nine useful factors.***  Survey items that did not have a high factor loading (on a single factor) or presented analysis problems were eliminated, resulting in 60 useful survey items.  Additional statistical analysis showed that the survey responses from each individual site were similar and the various sites had different responses on the nine factors.

Statistically significant correlations were observed between both overall SC and individual SC factors and the safety performance measures.****  A follow-on regression analysis suggested “that the factors collectively accounted for 23–52% of the variance in concurrent safety performance.” (p. 45)

“The significant correlations between overall safety culture and measures of safety performance ranged from -.26 to -.45, suggesting a medium effect and that safety culture accounts for 7–21% of the variance in most of the measures of safety performance examined in this study.” (p. 45)

Here is an example of a specific finding: “The most consistent relationship across both the correlation and regression analyses seemed to be between the safety culture factor questioning attitude, and the outcome variable NRC allegations. . . .Questioning attitude was also a significant predictor of concurrent counts of inspection findings associated with ROP cross-cutting aspects, the cross-cutting area of human performance, and total number of SCCIs. Fostering a questioning attitude may be a particularly important component of the overall safety culture of an organization.” (p. 45)

And another: “It is particularly interesting that the only measure of safety performance that was not significantly correlated with safety culture was industrial safety accident rate.” (p. 46)

The authors caution that “The single administration of the survey, combined with the correlational analyses, does not permit conclusions to be drawn regarding a causal relationship between safety culture and safety performance.  In particular, the findings presented here are exploratory, mainly because the correlational analyses cannot be used to verify causality and the data used represent snapshots of safety culture and safety performance.” (p. 46)

The relationships between SC and current performance were stronger than between SC and future performance.  This should give pause to those who would rush to use SC data as a leading indicator. 

Our Perspective 

This is a dense paper and important details may be missing from this summary.  If you are interested in this topic then you should definitely read the original and our October 22, 2010 post.

That recognizable factors dropped out of the PCA should not be a surprise.  In fact, the opposite would have been the real surprise.  After all, the survey was constructed to include previously identified SC traits.  The nine factors mapped well against previously identified SC traits and INPO principles. 

However, there was no explanation, in either the original presentation or this paper, of why the 11 safety performance measures were chosen out of a large universe.  After all, the NRC and INPO collect innumerable types of performance data.  Was there some cherry picking here?  I have no idea but it creates an opportunity for a statistical aside, presented in a footnote below.*****

The authors attempt to explain some correlations by inventing a logic that connects the SC factor to the performance measure.  But it just speculation because, as the authors note, correlation is not causality.  You should look at the correlation tables and see if they make sense to you, or if some different processes are at work here. 

One aspect of this paper bothers me a little.  In the October 22, 2010 NRC public meeting, the INPO presenter said the analysis was INPO’s while an NRC presenter said NRC staff had reviewed and accepted the INPO analysis, which had been verified by an outside NRC contractor.  For this paper, those two presenters are joined by another NRC staffer as co-authors.  This is a difference.  It passes the smell test but does evidence a close working relationship between an independent public agency and a secretive private entity.

*  S.L. Morrow, G.K. Koves and V.E. Barnes, “Exploring the relationship between safety culture and safety performance in U.S. nuclear power operations,” Safety Science 69 (2014), pp. 37–47.  ADAMS ML14224A131.

**  The eleven performance measures included seven NRC measures (Unplanned scrams, NRC allegations,  ROP cross-cutting aspects,  Human performance cross-cutting inspection findings, Problem identification and resolution cross-cutting inspection findings, Substantive cross-cutting issues in the human performance or problem identification and resolution area and ROP action matrix oversight, i.e., which column a plant is in) and four INPO measures (Chemistry performance, Human performance error rate, Forced loss rate and Industrial safety accident rate.

***  The nine SC factors were management commitment to safety, willingness to raise safety concerns, decision making, supervisor responsibility for safety, questioning attitude, safety communication, personal responsibility for safety, prioritizing safety and training quality.

****  Specifically, 13 (out of 22) overall SC correlations with the current and future performance measures were significant as were 84 (out of 198) individual SC factor correlations.

*****  It would be nice to know if any background statistical testing was performed to pick the performance measures.  This is important because if one calculates enough correlations, or any other statistic, one will eventually get some false positives (Type I errors).  One way to counteract this problem is to establish a more restrictive threshold for significance, e.g., 0.01 vs 0.05 or 0.005 vs. 0.01. This note is simply my cautionary view.  I am not suggesting there are any methodological problem areas in the subject paper.