Sunday, October 5, 2014

Update on INPO Safety Culture Study

On October 22, 2010 we reported on an INPO study that correlated safety culture (SC) survey data with safety performance measures.  A more complete version of the analysis was published in an academic journal* this year and this post expands on our previous comments.

Summary of the Paper

The new paper begins with a brief description of SC and related research.  Earlier research suggests that some modest relationship exists between SC and safety performance but the studies were limited in scope.  Longitudinal (time-based) studies have yielded mixed results.  Overall, this leaves plenty of room for new research efforts.

According to the authors, “The current study provides a unique contribution to the safety culture literature by examining the relationship between safety culture and a diverse set of performance measures [NRC industry trends, ROP data and allegations, and INPO plant data] that focus on the overall operational safety of a nuclear power plant.” (p. 39)  They hypothesized small to medium correlations between current SC survey data and eleven then-current (2010) and future (2011) safety performance measures.**

The 110-item survey instrument was distributed across the U.S. nuclear industry and 2876 useable responses were received from employees and contractors representing almost all U.S. plants.  Principal components analysis (PCA) was applied to the survey data and resulted in nine useful factors.***  Survey items that did not have a high factor loading (on a single factor) or presented analysis problems were eliminated, resulting in 60 useful survey items.  Additional statistical analysis showed that the survey responses from each individual site were similar and the various sites had different responses on the nine factors.

Statistically significant correlations were observed between both overall SC and individual SC factors and the safety performance measures.****  A follow-on regression analysis suggested “that the factors collectively accounted for 23–52% of the variance in concurrent safety performance.” (p. 45)

“The significant correlations between overall safety culture and measures of safety performance ranged from -.26 to -.45, suggesting a medium effect and that safety culture accounts for 7–21% of the variance in most of the measures of safety performance examined in this study.” (p. 45)

Here is an example of a specific finding: “The most consistent relationship across both the correlation and regression analyses seemed to be between the safety culture factor questioning attitude, and the outcome variable NRC allegations. . . .Questioning attitude was also a significant predictor of concurrent counts of inspection findings associated with ROP cross-cutting aspects, the cross-cutting area of human performance, and total number of SCCIs. Fostering a questioning attitude may be a particularly important component of the overall safety culture of an organization.” (p. 45)

And another: “It is particularly interesting that the only measure of safety performance that was not significantly correlated with safety culture was industrial safety accident rate.” (p. 46)

The authors caution that “The single administration of the survey, combined with the correlational analyses, does not permit conclusions to be drawn regarding a causal relationship between safety culture and safety performance.  In particular, the findings presented here are exploratory, mainly because the correlational analyses cannot be used to verify causality and the data used represent snapshots of safety culture and safety performance.” (p. 46)

The relationships between SC and current performance were stronger than between SC and future performance.  This should give pause to those who would rush to use SC data as a leading indicator. 

Our Perspective 

This is a dense paper and important details may be missing from this summary.  If you are interested in this topic then you should definitely read the original and our October 22, 2010 post.

That recognizable factors dropped out of the PCA should not be a surprise.  In fact, the opposite would have been the real surprise.  After all, the survey was constructed to include previously identified SC traits.  The nine factors mapped well against previously identified SC traits and INPO principles. 

However, there was no explanation, in either the original presentation or this paper, of why the 11 safety performance measures were chosen out of a large universe.  After all, the NRC and INPO collect innumerable types of performance data.  Was there some cherry picking here?  I have no idea but it creates an opportunity for a statistical aside, presented in a footnote below.*****

The authors attempt to explain some correlations by inventing a logic that connects the SC factor to the performance measure.  But it just speculation because, as the authors note, correlation is not causality.  You should look at the correlation tables and see if they make sense to you, or if some different processes are at work here. 

One aspect of this paper bothers me a little.  In the October 22, 2010 NRC public meeting, the INPO presenter said the analysis was INPO’s while an NRC presenter said NRC staff had reviewed and accepted the INPO analysis, which had been verified by an outside NRC contractor.  For this paper, those two presenters are joined by another NRC staffer as co-authors.  This is a difference.  It passes the smell test but does evidence a close working relationship between an independent public agency and a secretive private entity.

*  S.L. Morrow, G.K. Koves and V.E. Barnes, “Exploring the relationship between safety culture and safety performance in U.S. nuclear power operations,” Safety Science 69 (2014), pp. 37–47.  ADAMS ML14224A131.

**  The eleven performance measures included seven NRC measures (Unplanned scrams, NRC allegations,  ROP cross-cutting aspects,  Human performance cross-cutting inspection findings, Problem identification and resolution cross-cutting inspection findings, Substantive cross-cutting issues in the human performance or problem identification and resolution area and ROP action matrix oversight, i.e., which column a plant is in) and four INPO measures (Chemistry performance, Human performance error rate, Forced loss rate and Industrial safety accident rate.

***  The nine SC factors were management commitment to safety, willingness to raise safety concerns, decision making, supervisor responsibility for safety, questioning attitude, safety communication, personal responsibility for safety, prioritizing safety and training quality.

****  Specifically, 13 (out of 22) overall SC correlations with the current and future performance measures were significant as were 84 (out of 198) individual SC factor correlations.

*****  It would be nice to know if any background statistical testing was performed to pick the performance measures.  This is important because if one calculates enough correlations, or any other statistic, one will eventually get some false positives (Type I errors).  One way to counteract this problem is to establish a more restrictive threshold for significance, e.g., 0.01 vs 0.05 or 0.005 vs. 0.01. This note is simply my cautionary view.  I am not suggesting there are any methodological problem areas in the subject paper.


  1. I have probably spent an unreasonable amount of time by any sensible measure, trying to put my finger on what I take to be problematic with this paper. It is not hard to find things to be dissatisfied with - starting with the first sentence of the abstract. From the outset, the paper is framed as suggesting that in the US there is a "single national culture" which shapes NPP worker views. Anyone whose been to several plants in each of the NRC Regions will immediately doubt how deep the perspective is of the authors who made that assertion.

    The authors report: "For each nuclear power plant, the mean score for the total survey results and the factor means were correlated with organization-level performance indicators both concurrently and one year following the survey administration." The entire paper is an attempt to make this mechanical exercise of statistical correlation turn into at least of hint that "safety culture begets regulatory performance."

    The effort was doomed to failure if one considers that the NRC has been inspecting aggregate performance into individual NPP with a strong conservative streak since the TMI Accident. Up until the point where INPO got traction across the entire national fleet's performance, the results of NRC "setting the bar" with the precautionary principle was plants spending lots of time in shutdown. Rigorous oversight of Tech Spec LCO compliance was successful in assuring that poor performance degraded promptly into shutdown. One would have to look at the entire System of Systems architecture to see this pattern.

    Between 1980 and 2000 we saw the period of improving management for reliability (i.e. availability). Along the way, as fleet performance began to approach design maximums, the significance of latent management weakness become more conspicuous (cf. Peach Bottom Recovery from Control Room sleeping, or Millstone QA breakdown). Throughout this period, NRC continued to be pretty unforgiving of most corner cutting once it came to light via the Inspection Program. As always, design Defense in Depth provided ample margins to big accidents.

    With the shift to the ROP, the NRC made adjustments to account for higher unit performance. The performance cornerstones plus the cross-cutting areas provided good operators a reward for self-maintained vigilance. Three decades of statistical data gave NRC a pretty good idea of what sorts of baseline values in the PIs were achievable at the mean.

    If the industry wanted more and more to be performance based by "objective measures' Davis Besse punched a hole in their aspirations. NRC responded to DB by doubling down on its own commitment to conservative margins regarding "management getting it."

    What is troubling about this paper is the extent to which its suppositions and hypotheses blatantly ignore all this history. When the researchers go into factoral analysis for correlations among question on their survey, they come up with nine categories - and speak of them as if they were newly uncovered. In fact they are virtually identical to the secondary factors of cross-cutting area performance that were incorporated in the NRC Inspection Manual as far back as 2005-6 to reflect lessons learned from DB. They far predate the enumeration of traits and attributes in the NRC's 2011 Safety Culture Policy and related outfall in places like INPO 12-012 Rev 1.

    In similar fashion, the paper's authors ignore the role that the Significance Determination process and the annual roll up of total licensee performance play in promoting, through rigorous and steady feedback, applied in a graded fashion, the measures of aggregate performance to which they hope to benchmark their snapshot survey. That is just crappy science practice in my view.

  2. Continuing the previous comment:

    A conclusion I thought was particularly telling regarded the significance of the workers sense of personal responsibility for NSC effectiveness. They report finding no correlation between questions about this "factor" and the baseline performance. Given the role the HU and individual commitment are afforded near the top of the list of important attributes, this finding would seem to be quite significant; it doesn't seem to have been recognized as such or the authors perhaps just didn't know how to fit it into the forced arc of their master narrative.

    The paper seems intent on making the case for using self-reported attitudinal surveys as providing leading indicators of future performance - without regard to the actual results of continuing inspections. As there is some indication that NEI would be happy to have this outcome and would use it to secure reduced frequency (and onerousness) of regular inspections; and to eliminate those nasty "subjective" judgments about cross cutting area indicators of management effectiveness, it is worrisome that two NRC staffers are so working hard to support such an argument.

    In summary: "Correlations suggested meaningful, statistically significant relationships between
    safety culture, as measured by the survey, and multiple nuclear power plant performance indicators." What the authors failed to discover is that the Inspection Program drives the performance indicators and thus the perceptions workers have. There is no reason shown here to believe that the reverse causal relationship will ever obtain


Thanks for your comment. We read them all. The moderator will publish comments that are related to our content.