Friday, October 14, 2011

Decision No. 2 Scoring Results

In July we initiated a process for readers to participate in evaluating the extent to which actual decisions made at nuclear plants were consistent with a strong safety culture.  (The decision scoring framework is discussed here and the results for the first decision are discussed here.)  Example decision 2 involved a temporary repair to a Service Water System piping elbow.  Performance of a permanent code repair was postponed until the next cold shutdown or refuel outage.

We asked readers to assess the decision in two dimensions: potential safety impact and the strength of the decision, using anchored scales to quantify the scores.  The chart shows the scoring results.  Our interpretation of the results is as follows:

As with the first decision, most of the scores did coalesce in a limited range for each scoring dimension.  Based on the anchored scales, this meant most people thought the safety impact was fairly significant, likely due to the extended time period of the temporary repair which could extend to the next refuel outage.  The people that scored safety significance in this range also scored the decision strength as one that reasonably balanced safety and other operational priorities.  Our interpretation here is that people viewed the temporary repair as a reasonable interim measure, sufficient to maintain an adequate safety margin.  Notwithstanding that most scores were in the mid range, there were also decision strength scores as low as 3 (safety had lower priority than desired) and as high as 9 (safety had high priority where competing priorities were significant).  Across this range of decision strength scores, the scores for safety impact were consistent at 8.  This clearly illustrates the potential for varying perceptions of whether a decision is consistent with a strong safety culture.  The reasons for the variation could be based on how people felt about the efficacy of the temp repair or simply different standards or expectations for how aggressively one should address the leakage problem.

It is not very difficult to see how this scoring variability could translate into similarly mixed safety culture survey results.  But unlike survey questions which tend to be fairly general and abstract, the decision scoring results provide a definitive focus for assessing the “why” of safety culture perceptions.  Training and self assessment activities could benefit from these data as well.  Perhaps most intriguing is the question of what level of decision strength is expected in an organization with a “strong” safety culture.  Is it 5 (reasonably balances…) or is something higher, in the 6 to 7 range, expected?  We note that the average decision strength for example 2 was about 5.2.

Stay tuned for more on decision scoring.

Saturday, October 8, 2011

You Want Safety Culture? Then Pass a Law.

On October 7, 2011 California governor Brown signed SB 705 authored by state senator Mark Leno. 

The Leno bill, among many others, was inspired by a major gas pipeline explosion that occurred September 9, 2010 in San Bruno, CA resulting in multiple fatalities.  The ensuing investigations have identified a familiar litany of contributing causes: defective welds, ineffective maintenance practices, missing and incomplete records, and lax corporate management.

SB 705 adds Sections 961 and 963 to the Public Utilities Code.  Section 961 requires each gas corporation to “develop a plan for the safe and reliable operation of its commission-regulated gas pipeline facility. . . .”* (§ 961(b)(1))

Section 963 states “It is the policy of the state that the commission and each gas corporation place safety of the public and gas corporation employees as the top priority. [emphasis added]  The commission shall take all reasonable and appropriate actions necessary to carry out the safety priority policy of this paragraph consistent with the principle of just and reasonable cost-based rates.”* (§ 963(b)(3))

I was surprised that an unambiguous statement about safety’s importance was apparently missing from the state’s code.  I give senator Leno full credit for this vital contribution.

Of course, he couldn’t leave well enough alone and was quoted as saying “It’s not going to fix the situation overnight, but it changes the culture immediately.”** [emphasis added]

Now this comment is typical political braggadocio, and the culture will not change “immediately.”  However, this law will make safety more prominent on the corporate radar and eventually there should be responsive changes in policies, practices, procedures and behaviors.

*  Bill Text: CA Senate Bill 705 - 2011-2012 Regular Session

**  W. Buchanan, “Governor signs bill forcing automatic pipe valves,” S.F. Chronicle (Oct. 8, 2011). 

Monday, September 26, 2011

Beyond Training - Reinforcing Culture

One of our recurring themes has been how to strengthen safety culture, either to sustain an acceptable level of culture or to address weaknesses and improve it.  We have been skeptical of the most common initiative - retraining personnel on safety culture principles and values.  Simply put, we don’t believe you can PowerPoint or poster your way to culture improvement.

By comparison we were more favorably inclined to some of the approaches put forth in a recent New York Times interview of Andrew Thompson, a Silicon Valley entrepreneur.  As Thompson observes,

“...it’s the culture of what you talk about, what you celebrate, what you reward, what you make visible.  For example, in this company, which is very heavily driven by intellectual property, if you file a patent or have your name on a patent, we give you a little foam brain.”*

Foam “brains”.  How clever.  He goes on to describe other ideas such as employees being able to recognize each other for demonstrating desired values by awarding small gold coins (a nice touch here as the coins have monetary value that can be realized or retained as a visible trophy), and volunteer teams that work on aspects of culture.  The common denominator of much of this: management doesn’t do it, employees do.

*  A. Bryant, “Speak Frankly, but Don’t Go ‘Over the Net’,” New York Times (September 17, 2011).

Monday, September 12, 2011

Understanding the Risks in Managing Risks

Our recent blog posts have discussed the work of anthropologist Constance Perin.  This post looks at her book, Shouldering Risks: The Culture of Control in the Nuclear Power Industry.*  The book presents four lengthy case studies of incidents at three nuclear power plants and Perin’s analysis which aims to explain the cultural attributes that facilitated the incidents’ occurrence or their unfavorable evolution.

Because they fit nicely with our interest in decision-making, this post will focus on the two case studies that concerned hardware issues.**  The first case involved a leaking, unisolable valve in the reactor coolant system (RCS) that needed repacking, a routine job.  The mechanics put the valve on its backseat, opened it, observed the packing moving up (indicating that the water pressure was too high or the backseat step hadn't worked), and closed it up.  After management meetings to review the situation, the mechanics tried again, packing came out, and the leak became more serious.  The valve stem and disc had separated, a fact that was belatedly recognized.  The leak was eventually sufficiently controlled so the plant could wait until the next outage to repair/replace the valve.  

The second case involved a switchyard transformer that exhibited a hot spot during a thermography examination.  Managers initially thought they had a circulating current issue, a common problem.  After additional investigations, including people climbing on ladders up alongside the transformer, a cover bolt was removed and the employee saw a glow inside the transformer, the result of a major short.  Transformers can, and have, exploded from such thermal stresses but the plant was able to safely shut down to repair/replace the transformer.

In both cases, there was at least one individual who knew (or strongly suspected) that something more serious was wrong from the get-go but was unable to get the rest of the organization to accept a more serious, i.e., costly, diagnosis.

Why were the plant organizations so willing, even eager, to assume the more conventional explanations for the problems they were seeing?  Perin provides a multidimensional framework that helps answer that question.

The first dimension is the tradeoff quandary, the ubiquitous tension between production and cost, including costs associated with safety.  Plant organizations are expected to be making electricity, at a budgeted cost, and that subtle (or not-so-subtle) pressure colors the discussion of any problem.  There is usually a preference for a problem explanation and corrective action that allows the plant to continue running.

Three control logics constitute a second dimension.  The calculated logics are the theory of how a plant is (or should be) designed, built, and operated.  The real-time logics consist of the knowledge of how things actually work in practice.  Policy logics come from above, and represent generalized guidelines or rules for behavior, including decision-making.  An “answer” that comes from calculated or policy logic will be preferred over one that comes from real-time logic, partly because the former have been developed by higher-status groups and partly because such answers are more defensible to corporate bosses and regulators.

Finally, traditional notions of group and individual status and a key status property, credibility, populate a third dimension: design engineers over operators over system engineers over maintenance over others; managers over individual contributors; old-timers over newcomers.  Perin creates a construct of the various "orders"*** in a plant organization, specialists such as operators or system engineers.  Each order has its own worldview, values and logics – optimum conditions for nurturing organizational silos.  Information and work flows are mediated among different orders via plant-wide programs (themselves products of calculated and policy logics).
 
Application to Cases

The aforementioned considerations can be applied to the two cases.  Because the valve was part of the RCS, it should have been subject to more detailed planning, including additional risk analysis and contingency prep.  This was pointed out by a new-to-his-job work planner who was basically ignored because of his newcomer status.  And before the work was started, the system engineer (SE) observed that this type of valve (which had a problem history at this plant and elsewhere) was prone to valve disk/stem separation and this particular valve appeared to have the problem based on his visual inspection (it had one thread less visible than other similar valves).  But the SE did not make his observations forcefully and/or officially (by initiating a CR) so his (accurate) observation was not factored into the early decision-making.  Ultimately, their concerns did not sway the overall discussion where the schedule was highest priority.  A radiographic examination that would have shown the valve/disc separation was not performed early on because that was an Engineering responsibility and the valve repair was a Maintenance project.

The transformer is on the non-nuclear side of the plant, which makes the attitudes toward it less focused and critical than for safety-related equipment.  The hot spot was discovered by a tech who was working with a couple of thermography consultants.  Thermography was a relatively new technology at this plant and not well-understood by plant managers (or trusted because early applications had given false alarms).  The tech said that the patterns he observed were not typical for circulating currents but neither he nor the consultants (the three people on-site who understood thermography) were in the meetings where the problem was discussed.  The circulating current theory was popular because (a) the plant had experienced such problems in the past and (b) addressing it could be done without shutting down the plant.  Production pressure, the nature of past problems, and the lower status of roles and equipment that are not safety related all acted to suppress the emergent new knowledge of what the problem actually was.  

Lessons Learned

Perin’s analytic constructs are complicated and not light reading.  However, the interviews in the case studies are easy to read and very revealing.  It will come as no surprise to people with consulting backgrounds that the interviewees were capable of significant introspection.  In the harsh light of hindsight, lots of folks can see what should (and could) have happened.  

The big question is what did those organizations learn?  Will they make the same mistakes again?  Probably not.  But will they misinterpret future weak or ambiguous signals of a different nascent problem?  That’s still likely.  “Conventional wisdom” codified in various logics and orders and guided by a production imperative remains a strong force working against the open discussion of alternative explanations for new experiences, especially when problem information is incomplete or fuzzy.  As Bob Cudlin noted in his August 17, 2011 post: [When dealing with risk-imbued issues] “the intrinsic uncertainties in significance determination opens the door to the influence of other factors - namely those ever present considerations of cost, schedule, plant availability, and even more personal interests, such as incentive programs and career advancement.”

   
*  C. Perin, Shouldering Risks: The Culture of Control in the Nuclear Power Industry, (Princeton, NJ: Princeton University Press, 2005).

**  The case studies and Perin’s analysis have been greatly summarized for this blog post.

***  The “orders” include outsiders such as NRC, INPO or corporate overseers.  Although this may not be totally accurate, I picture orders as akin to medieval guilds.

Wednesday, August 17, 2011

Additional Thoughts on Significance Culture

Our previous post introduced the work of Constance Perin,  Visiting Scholar in Anthropology at MIT, including her thesis of “significance culture” in nuclear installations.  Here we expand on the intersection of her thesis with some of our work. 

Perin places primary emphasis on the availability and integration of information to systematize and enhance the determination of risk significance.  This becomes the true organizing principle of nuclear operational safety and supplants the often hazy construct of safety culture.  We agree with the emphasis on more rigorous and informed assessments of risk as an organizing principle and focus for the entire organization. 

Perin observes: “Significance culture arises out of a knowledge-using and knowledge-creating paradigm. Its effectiveness depends less on “management emphasis” and “personnel attitudes” than on having an operational philosophy represented in goals, policies, priorities, and actions organized around effectively characterizing questionable conditions before they can escalate risk.” (Significance Culture, p. 3)*

We found a similar thought from Kenneth Brawn on a recent LinkedIn post under the Nuclear Safety Group.  He states, “Decision making, and hence leadership, is based on accurate data collection that is orchestrated, focused, real time and presented in a structured fashion for a defined audience….Managers make decisions based on stakeholder needs – the problem is that risk is not adequately considered because not enough time is taken (given) to gather and orchestrate the necessary data to provide structured information for the real time circumstances.” ** 

While seeing the potential unifying force of significance culture, we are mindful also that such determinations often are made under a cloak of precision that is not warranted or routinely achievable.  Such analyses are complex, uncertain, and subject to considerable judgment by the involved analysts and decision makers.  In other words, they are inherently fuzzy.  This limitation can only be partly remedied through better availability of information.  Nuclear safety does not generally include “bright lines” of acceptable or unacceptable risks, or finely drawn increments of risk.  Sure, PRA analyses and other “risk informed” approaches provide the illusion of quantitative precision, and often provide useful insight for devising courses of action that that do not pose “undue risk” to public safety.  But one does not have to read too many Licensee Event Reports (LERs) to see that risk determinations are ultimately shades of gray.  For one example, see the background information on our decision scoring example involving a pipe leak in a 30” moderate energy piping elbow and interim repair.  The technical justification for the interim fix included terms such as “postulated”, “best estimate” and “based on the assumption”.  A full reading of the LER makes clear the risk determination involved considerable qualitative judgment by the licensee in making its case and the NRC in approving the interim measure. That said, the NRC’s justification also rested in large part on a finding of “hardship or unusual difficulty” if a code repair were to be required immediately.

Where is this leading us?  Are poor safety decisions the result of the lack of quality information?  Perhaps.  However another scenario that is at least equally likely, is that the appropriate risk information may not be pursued vigorously or the information may be interpreted in the light most favorable to the organization’s other priorities.  We believe that the intrinsic uncertainties in significance determination opens the door to the influence of other factors - namely those ever present considerations of cost, schedule, plant availability, and even more personal interests, such as incentive programs and career advancement.  Where significance is fuzzy, it invites rationalization in the determination of risk and marginalization of the intrinsic uncertainties.  Thus a desired decision outcome could encourage tailoring of the risk determination to achieve the appropriate fit.  It may mean that Perin’s focus on “effectively characterizing questionable conditions” must also account for the presence and potential influence of other non-safety factors as part of the knowledge paradigm.   

This brings us back to Perin’s ideas for how to pull the string and dig deeper into this subject.  She finds, “Condition reports and event reviews document not only material issues. Uniquely, they also document systemic interactions among people, priorities, and equipment — feedback not otherwise available.” (Significance Culture, p.5)  This emphasis makes a lot of sense and in her book, Shouldering Risks: The Culture of Control in the Nuclear Power Industry, she takes up the challenge of delving into the depths of a series of actual condition reports.  Stay tuned for our review of the book in a subsequent post.


*  C. Perin, “Significance Culture in Nuclear Installations,” a paper presented at the 2005 Annual Meeting of the American Nuclear Society (June 6, 2005).

**  You may be asked to join the LinkedIn Nuclear Safety group to view Mr. Brawn's comment and the discussion of which it is part.

Friday, August 12, 2011

An Anthropologist’s View

Academics in many disciplines study safety culture.  This post introduces to this blog the work of an MIT anthropologist, Constance Perin, and discusses a paper* she presented at the 2005 ANS annual meeting.

We picked a couple of the paper’s key recommendations to share with you.  First, Perin’s main point is to advocate the development of a “significance culture” in nuclear power plant organizations.  The idea is to organize knowledge and data in a manner that allows an organization to determine significance with respect to safety issues.  The objective is to increase an organization’s capabilities to recognize and evaluate questionable conditions before they can escalate risk.  We generally agree with this aim.  The real nub of safety culture effectiveness is how it shapes the way an organization responds to new or changing situations.

Perin understands that significance evaluation already occurs in both formal processes (e.g., NRC evaluations and PRAs) and in the more informal world of operational decisions, where trade-offs, negotiations, and satisficing behavior may be more dynamic and less likely to be completely rational.  She recommends that significance evaluation be ascribed a higher importance, i.e., be more formally and widely ingrained in the overall plant culture, and used as an organizing principle for defining knowledge-creating processes. 

Second, because of the importance of a plant's Corrective Action Program (CAP), Perin proposes making NRC assessment of the CAP the “eighth cornerstone” of the Reactor Oversight Process (ROP).  She criticizes the NRC’s categorization of cross cutting issues for not being subjected to specific criteria and performance indicators.  We have a somewhat different view.  Perin’s analysis does not acknowledge that the industry places great emphasis on each of the cross cutting issues in terms of performance indicators and monitoring including self assessment.**  It is also common to the other cornerstones where the plants use many more indicators to track and trend performance than the few included in the ROP.  In our opinion, a real problem with the ROP is that its few indicators do not provide any reliable or forward looking picture of nuclear safety. 

The fault line in the CAP itself may better be characterized in terms of the lack of measurement and assessment of how well the CAP program functions to sustain a strong safety culture.  Importantly such an approach would evaluate how decisions on conditions adverse to quality properly assessed not only significance, but balanced the influence of any competing priorities.  Perin also recognizes that competing priorities exist, especially in the operational world, but making the CAP a cornerstone might actually lead to increased false confidence in the CAP if its relationship with safety culture was left unexamined.

Prof. Perin has also written a book, Shouldering Risks: The Culture of Control in the Nuclear Power Industry,*** which is an ethnographic analysis of nuclear organizations and specific events they experienced.  We will be reviewing this book in a future post.  We hope that her detailed drill down on those events will yield some interesting insights, e.g., how different parts of an organization looked at the same situation but had differing evaluations of its risk implications.

We have to admit we didn’t detect Prof. Perin on our radar screen; she alerted us to the presence of her work.  Based on our limited review to date, we think we share similar perspectives on the challenges involved in attaining and maintaining a robust safety culture.


*  C. Perin, “Significance Culture in Nuclear Installations,” a paper presented at the 2005 Annual Meeting of the American Nuclear Society (June 6, 2005).

** The issue may be one of timing.  Prof. Perin based her CAP recommendation, in part, on a 2001 study that suggested licensees’ self-regulation might be inadequate.  We have the benefit of a more contemporary view.  

*** C. Perin, Shouldering Risks: The Culture of Control in the Nuclear Power Industry, (Princeton, NJ: Princeton University Press, 2005).

Friday, July 15, 2011

Decision Scoring No. 2

This post introduces the second decision scoring example.  Click here, or the box above this post, to access the detailed decision summary and scoring feature.  

This example involves a proposed non-code repair to a leak in the elbow of service water system piping.  By opting for a non-code, temporary repair, a near term plant shutdown will be avoided but the permanent repair will be deferred for as long as 20 months.  In grading this decision for safety impact and decision strength, it may be helpful to think about what alternatives were available to this licensee.  We could think of several:

-    not perform a temporary repair as current leakage was within tech spec limits, but implement an augmented inspection and monitoring program to timely identify any further degradation.

-    perform the temporary repair as described but commit to perform the permanent repair within a shorter time period, say 6 months.

-    immediately shut down and perform the code repair.

Each of these alternatives would likely affect the potential safety impact of this leak condition and influence the perception of the decision strength.  For example a decision to shut down immediately and perform the code repair would likely be viewed as quite conservative, certainly more conservative than the other options.  Such a decision might provide the strongest reinforcement of safety culture.  The point is that none of these decisions is necessarily right or wrong, or good or bad.  They do however reflect more or less conservatism, and ultimately say something about safety culture.