Friday, November 11, 2011

The Mother of Bad Decisions?

This is not about safety culture, but it’s nuclear related and, given our recent emphasis on decision-making, we can’t pass over it without commenting.

The steam generators (SGs) were recently replaced at Crystal River 3.  This was a large and complex undertaking but SGs have been successfully replaced at many other plants.  The Crystal River project was more complicated because it required cutting an opening in the containment but this, too, has been successfully accomplished at other plants.

The other SG replacements were all managed by two prime contractors, Bechtel and the Steam Generator Team (SGT).  However, to save a few bucks, $15 million actually, Crystal River decided to manage the project themselves.  (For perspective, the target cost for the prime contractor, exclusive of incentive fee, was $73 million.)  (Franke, Exh. JF-32, p. 8)*
 
Cutting the opening resulted in delamination of the containment, basically the outer 10 inches of concrete separated from the overall 42-inch thick structure in an area near the opening.  Repairing the plant and replacement power costs are estimated at more than $2.5 billion.**  It’s not clear when the plant will be running again, if ever.

Progress Energy Florida (PEF), the plant owner, says insurance will cover most of the costs.  We’ll see.  But PEF also wants Florida ratepayers to pay.  PEF claims they “managed and executed the SGR [steam generator replacement] project in a reasonable and prudent manner. . . .”  (Franke, p. 3)

The delamination resulted from “unprecedented and unpredictable circumstances beyond PEF's control and in spite of PEF's prudent management. . . .” (Franke, p. 2)

PEF’s “root cause investigation determined that there were seven factors that contributed to the delamination. . . . These factors combined to cause the delamination during the containment opening activities in a complex interaction that was unprecedented and unpredictable.” [emphasis added]  (Franke, p. 27)***

This is an open docket, i.e., the Florida PSC has not yet determined how much, if anything, the ratepayers will have to pay.  Will the PSC believe that a Black Swan settled at the Crystal River plant?  Or is the word “hubris” more likely to come to mind?


* “Testimony & Exhibits of Jon Franke,” Fla. Public Service Commission Docket No. 100437-EI (Oct. 10, 2011).

**  I. Penn, “Cleaning up a DIY repair on Crystal River nuclear plant could cost $2.5 billion,” St. Petersburg Times via tampabay.com website (Oct. 9, 2011).  This article provides a good summary of the SG replacement project.

***  For the detail-oriented, “. . . the technical root cause of the CR3 wall delamination was the combination of: 1) tendon stresses; 2) radial stresses; 3) industry design engineering analysis inadequacies for stress concentration factors; 4) concrete strength properties; 5) concrete aggregate properties; and 6) the de-tensioning sequence and scope. . . . another factor, the process of removing the concrete itself, likely contributed to the extent of the delamination. . . .” From “Testimony & Exhibits of Garry Miller,” Fla. Public Service Commission Docket No. 100437-EI (Oct. 10, 2011), p. 5.

Wednesday, November 9, 2011

Ultimate Bonuses

Just when you think there is a lack of humor in the exposition of dry, but critical issues, such as risk management, our old friend Nicholas Taleb comes to the rescue.*  His op-ed piece in the New York Times** earlier this week has a subdued title, “End Bonuses for Bankers”, but includes some real eye-openers.  For example Taleb cites (with hardly concealed admiration) the ancient Hammurabi code which protected home owners by calling for the death of the home builder if the home collapsed and killed the owner.  Wait, I thought we were talking about bonuses, not capital punishment.

What Taleb is concerned about is that bonus systems in entities that pose systemic risks almost universally encourage behaviors that may not be consistent with the public good much less the long term health of the business entity.  In short he believes that bonuses provide an incentive to take risks.***  He states, “The asymmetric nature of the bonus (an incentive for success without a corresponding disincentive for failure) causes hidden risks to accumulate in the financial system and become a catalyst for disaster.”  Now just substitute “nuclear operations” for “the financial system”. 

Central to Taleb’s thesis is his belief that management has a large informational advantage over outside regulators and will always know more about risks being taken within their operation.  It affords management the opportunity to both take on additional risk (say to meet an incentive plan goal) and to camouflage the latent risk from regulators.

In our prior posts [here, here and here] on management incentives within the nuclear industry, we also pointed to the asymmetry of bonus metrics - the focus on operating availability and costs, the lack of metrics for safety performance, and the lack of downside incentive for failure to meet safety goals.  The concern was amplified due to the increasing magnitude of nuclear executive bonuses, both in real terms and as a percentage of total compensation. 

So what to do?  Taleb’s answer for financial institutions too big to fail is “bonuses and bailouts should never mix”; in other words, “end bonuses for bankers”.  Our answer is, “bonuses and nuclear safety culture should never mix”; “end bonuses for nuclear executives”.  Instead, gross up the compensation of nuclear executives to include the nominal level of expected bonuses.  Then let them manage nuclear operations using their best judgment to assure safety, unencumbered by conflicting incentives.


*  Taleb is best known for The Black Swan, a book focusing on the need to develop strategies, esp. financial strategies, that are robust in the face of rare and hard-to-predict events.

**  N. Taleb, “End Bonuses for Bankers,” New York Times website (Nov. 7, 2011).

*** It is widely held that the 2008 financial crisis was exacerbated, if not caused, by executives making more risky decisions than shareholders would have thought appropriate. Alan Greenspan commented: “I made a mistake in presuming that the self-interests of organizations, specifically banks and others, were such that they were best capable of protecting their own shareholders” (Testimony to Congress, quoted in A. Clark and J. Treanor, “Greenspan - I was wrong about the economy. Sort of,” The Guardian, Oct. 23, 2008). The cause is widely thought to be the use of bonuses for performance combined with limited liability.  See also J.M. Malcomson, “Do Managers with Limited Liability Take More Risky Decisions? An Information Acquisition Model”, Journal of Economics & Management Strategy, Vol. 20, Issue 1 (Spring 2011), pp. 83–120.

Friday, November 4, 2011

A Factory for Producing Decisions

The subject of this post is the compelling insights of Daniel Kahneman into issues of behavioral economics and how we think and make decisions.  Kahneman is one of the most influential thinkers of our time and a Nobel laureate.  Two links are provided for our readers who would like additional information.  One is via the McKinsey Quarterly, a video interview* done several years ago.  It runs about 17 minutes.  The second is a current review in The Atlantic** of Kahneman’s just released book, Thinking Fast and Slow.

Kahneman begins the McKinsey interview by suggesting that we think of organizations as “factories for producing decisions” and therefore, think of decisions as a product.  This seems to make a lot of sense when applied to nuclear operating organizations - they are the veritable “River Rouge” of decision factories.  What may be unusual for nuclear organizations is the large percentage of decisions that directly or indirectly include safety dimensions, dimensions that can be uncertain and/or significantly judgmental, and which often conflict with other business goals.  So nuclear organizations have to deliver two products: competitively priced megawatts and decisions that preserve adequate safety.

To Kahneman decisions as product logically raises the issue of quality control as a means to ensure the quality of decisions.  At one level quality control might focus on mistakes and ensuring that decisions avoid recurrence of mistakes.  But Kahneman sees the quality function going further into the psychology of the decision process to ensure, e.g., that the best information is available to decision makers, that the talents of the group surrounding the ultimate decision maker are being used effectively, and the presence of an unbiased decision-making environment.

He notes that there is an enormous amount of resistance within organizations to improving decision processes. People naturally feel threatened if their decisions are questioned or second guessed.  So it may be very difficult or even impossible to improve the quality of decisions if the leadership is threatened too much.  But, are there ways to avoid this?  Kahneman suggests the “premortem” (think of it as the analog to a post mortem).  When a decision is being formulated (not yet made), convene a group meeting with the following premise: It is a year from now, we have implemented the decision under consideration, it has been a complete disaster.  Have each individual write down “what happened?”

The objective of the premortem is to legitimize dissent and minimize the innate “bias toward optimism” in decision analysis.  It is based on the observation that as organizations converge toward a decision, dissent becomes progressively more difficult and costly and people who warn or dissent can be viewed as disloyal.  The premortem essentially sets up a competitive situation to see who can come up with the flaw in the plan.  In essence everyone takes on the role of dissenter.  Kahneman’s belief is that the process will yield some new insights - that may not change the decision but will lead to adjustments to make the decision more robust. 

Kahneman’s ideas about decisions resonate with our thinking that the most useful focus for nuclear safety culture is the quality of organizational decisions.  It also contrasts with a recent instance of a nuclear plant run afoul of the NRC (Browns Ferry) and now tagged with a degraded cornerstone and increased inspections.  As usual in the nuclear industry, TVA has called on an outside contractor to come in and perform a safety culture survey, to “... find out if people feel empowered to raise safety concerns….”***  It may be interesting to see how people feel, but we believe it would be far more powerful and useful to analyze a significant sample of recent organizational decisions to determine if the decisions reflect an appropriate level of concern for safety.  Feelings (perceptions) are not a substitute for what is actually occurring in the decision process. 

We have been working to develop ways to grade whether decisions support strong safety culture, including offering opportunities on this blog for readers to “score” actual plant decisions.  In addition we have highlighted the work of Constance Perin including her book, Shouldering Risks, which reveals the value of dissecting decision mechanics.  Perin’s observations about group and individual status and credibility and their implications for dissent and information sharing directly parallel Kahneman’s focus on the need to legitimize dissent.  We hope some of this thinking ultimately overcomes the current bias in nuclear organizations to reflexively turn to surveys and the inevitable retraining in safety culture principles.


*  "Daniel Kahneman on behavioral economics," McKinsey Quarterly video interview (May 2008).

** M. Popova, "The Anti-Gladwell: Kahneman's New Way to Think About Thinking," The Atlantic website (Nov. 1, 2011).

*** A. Smith, "Nuke plant inspections proceeding as planned," Athens [Ala.] News Courier website (Nov. 2, 2011).

Friday, October 14, 2011

Decision No. 2 Scoring Results

In July we initiated a process for readers to participate in evaluating the extent to which actual decisions made at nuclear plants were consistent with a strong safety culture.  (The decision scoring framework is discussed here and the results for the first decision are discussed here.)  Example decision 2 involved a temporary repair to a Service Water System piping elbow.  Performance of a permanent code repair was postponed until the next cold shutdown or refuel outage.

We asked readers to assess the decision in two dimensions: potential safety impact and the strength of the decision, using anchored scales to quantify the scores.  The chart shows the scoring results.  Our interpretation of the results is as follows:

As with the first decision, most of the scores did coalesce in a limited range for each scoring dimension.  Based on the anchored scales, this meant most people thought the safety impact was fairly significant, likely due to the extended time period of the temporary repair which could extend to the next refuel outage.  The people that scored safety significance in this range also scored the decision strength as one that reasonably balanced safety and other operational priorities.  Our interpretation here is that people viewed the temporary repair as a reasonable interim measure, sufficient to maintain an adequate safety margin.  Notwithstanding that most scores were in the mid range, there were also decision strength scores as low as 3 (safety had lower priority than desired) and as high as 9 (safety had high priority where competing priorities were significant).  Across this range of decision strength scores, the scores for safety impact were consistent at 8.  This clearly illustrates the potential for varying perceptions of whether a decision is consistent with a strong safety culture.  The reasons for the variation could be based on how people felt about the efficacy of the temp repair or simply different standards or expectations for how aggressively one should address the leakage problem.

It is not very difficult to see how this scoring variability could translate into similarly mixed safety culture survey results.  But unlike survey questions which tend to be fairly general and abstract, the decision scoring results provide a definitive focus for assessing the “why” of safety culture perceptions.  Training and self assessment activities could benefit from these data as well.  Perhaps most intriguing is the question of what level of decision strength is expected in an organization with a “strong” safety culture.  Is it 5 (reasonably balances…) or is something higher, in the 6 to 7 range, expected?  We note that the average decision strength for example 2 was about 5.2.

Stay tuned for more on decision scoring.

Saturday, October 8, 2011

You Want Safety Culture? Then Pass a Law.

On October 7, 2011 California governor Brown signed SB 705 authored by state senator Mark Leno. 

The Leno bill, among many others, was inspired by a major gas pipeline explosion that occurred September 9, 2010 in San Bruno, CA resulting in multiple fatalities.  The ensuing investigations have identified a familiar litany of contributing causes: defective welds, ineffective maintenance practices, missing and incomplete records, and lax corporate management.

SB 705 adds Sections 961 and 963 to the Public Utilities Code.  Section 961 requires each gas corporation to “develop a plan for the safe and reliable operation of its commission-regulated gas pipeline facility. . . .”* (§ 961(b)(1))

Section 963 states “It is the policy of the state that the commission and each gas corporation place safety of the public and gas corporation employees as the top priority. [emphasis added]  The commission shall take all reasonable and appropriate actions necessary to carry out the safety priority policy of this paragraph consistent with the principle of just and reasonable cost-based rates.”* (§ 963(b)(3))

I was surprised that an unambiguous statement about safety’s importance was apparently missing from the state’s code.  I give senator Leno full credit for this vital contribution.

Of course, he couldn’t leave well enough alone and was quoted as saying “It’s not going to fix the situation overnight, but it changes the culture immediately.”** [emphasis added]

Now this comment is typical political braggadocio, and the culture will not change “immediately.”  However, this law will make safety more prominent on the corporate radar and eventually there should be responsive changes in policies, practices, procedures and behaviors.

*  Bill Text: CA Senate Bill 705 - 2011-2012 Regular Session

**  W. Buchanan, “Governor signs bill forcing automatic pipe valves,” S.F. Chronicle (Oct. 8, 2011). 

Monday, September 26, 2011

Beyond Training - Reinforcing Culture

One of our recurring themes has been how to strengthen safety culture, either to sustain an acceptable level of culture or to address weaknesses and improve it.  We have been skeptical of the most common initiative - retraining personnel on safety culture principles and values.  Simply put, we don’t believe you can PowerPoint or poster your way to culture improvement.

By comparison we were more favorably inclined to some of the approaches put forth in a recent New York Times interview of Andrew Thompson, a Silicon Valley entrepreneur.  As Thompson observes,

“...it’s the culture of what you talk about, what you celebrate, what you reward, what you make visible.  For example, in this company, which is very heavily driven by intellectual property, if you file a patent or have your name on a patent, we give you a little foam brain.”*

Foam “brains”.  How clever.  He goes on to describe other ideas such as employees being able to recognize each other for demonstrating desired values by awarding small gold coins (a nice touch here as the coins have monetary value that can be realized or retained as a visible trophy), and volunteer teams that work on aspects of culture.  The common denominator of much of this: management doesn’t do it, employees do.

*  A. Bryant, “Speak Frankly, but Don’t Go ‘Over the Net’,” New York Times (September 17, 2011).

Monday, September 12, 2011

Understanding the Risks in Managing Risks

Our recent blog posts have discussed the work of anthropologist Constance Perin.  This post looks at her book, Shouldering Risks: The Culture of Control in the Nuclear Power Industry.*  The book presents four lengthy case studies of incidents at three nuclear power plants and Perin’s analysis which aims to explain the cultural attributes that facilitated the incidents’ occurrence or their unfavorable evolution.

Because they fit nicely with our interest in decision-making, this post will focus on the two case studies that concerned hardware issues.**  The first case involved a leaking, unisolable valve in the reactor coolant system (RCS) that needed repacking, a routine job.  The mechanics put the valve on its backseat, opened it, observed the packing moving up (indicating that the water pressure was too high or the backseat step hadn't worked), and closed it up.  After management meetings to review the situation, the mechanics tried again, packing came out, and the leak became more serious.  The valve stem and disc had separated, a fact that was belatedly recognized.  The leak was eventually sufficiently controlled so the plant could wait until the next outage to repair/replace the valve.  

The second case involved a switchyard transformer that exhibited a hot spot during a thermography examination.  Managers initially thought they had a circulating current issue, a common problem.  After additional investigations, including people climbing on ladders up alongside the transformer, a cover bolt was removed and the employee saw a glow inside the transformer, the result of a major short.  Transformers can, and have, exploded from such thermal stresses but the plant was able to safely shut down to repair/replace the transformer.

In both cases, there was at least one individual who knew (or strongly suspected) that something more serious was wrong from the get-go but was unable to get the rest of the organization to accept a more serious, i.e., costly, diagnosis.

Why were the plant organizations so willing, even eager, to assume the more conventional explanations for the problems they were seeing?  Perin provides a multidimensional framework that helps answer that question.

The first dimension is the tradeoff quandary, the ubiquitous tension between production and cost, including costs associated with safety.  Plant organizations are expected to be making electricity, at a budgeted cost, and that subtle (or not-so-subtle) pressure colors the discussion of any problem.  There is usually a preference for a problem explanation and corrective action that allows the plant to continue running.

Three control logics constitute a second dimension.  The calculated logics are the theory of how a plant is (or should be) designed, built, and operated.  The real-time logics consist of the knowledge of how things actually work in practice.  Policy logics come from above, and represent generalized guidelines or rules for behavior, including decision-making.  An “answer” that comes from calculated or policy logic will be preferred over one that comes from real-time logic, partly because the former have been developed by higher-status groups and partly because such answers are more defensible to corporate bosses and regulators.

Finally, traditional notions of group and individual status and a key status property, credibility, populate a third dimension: design engineers over operators over system engineers over maintenance over others; managers over individual contributors; old-timers over newcomers.  Perin creates a construct of the various "orders"*** in a plant organization, specialists such as operators or system engineers.  Each order has its own worldview, values and logics – optimum conditions for nurturing organizational silos.  Information and work flows are mediated among different orders via plant-wide programs (themselves products of calculated and policy logics).
 
Application to Cases

The aforementioned considerations can be applied to the two cases.  Because the valve was part of the RCS, it should have been subject to more detailed planning, including additional risk analysis and contingency prep.  This was pointed out by a new-to-his-job work planner who was basically ignored because of his newcomer status.  And before the work was started, the system engineer (SE) observed that this type of valve (which had a problem history at this plant and elsewhere) was prone to valve disk/stem separation and this particular valve appeared to have the problem based on his visual inspection (it had one thread less visible than other similar valves).  But the SE did not make his observations forcefully and/or officially (by initiating a CR) so his (accurate) observation was not factored into the early decision-making.  Ultimately, their concerns did not sway the overall discussion where the schedule was highest priority.  A radiographic examination that would have shown the valve/disc separation was not performed early on because that was an Engineering responsibility and the valve repair was a Maintenance project.

The transformer is on the non-nuclear side of the plant, which makes the attitudes toward it less focused and critical than for safety-related equipment.  The hot spot was discovered by a tech who was working with a couple of thermography consultants.  Thermography was a relatively new technology at this plant and not well-understood by plant managers (or trusted because early applications had given false alarms).  The tech said that the patterns he observed were not typical for circulating currents but neither he nor the consultants (the three people on-site who understood thermography) were in the meetings where the problem was discussed.  The circulating current theory was popular because (a) the plant had experienced such problems in the past and (b) addressing it could be done without shutting down the plant.  Production pressure, the nature of past problems, and the lower status of roles and equipment that are not safety related all acted to suppress the emergent new knowledge of what the problem actually was.  

Lessons Learned

Perin’s analytic constructs are complicated and not light reading.  However, the interviews in the case studies are easy to read and very revealing.  It will come as no surprise to people with consulting backgrounds that the interviewees were capable of significant introspection.  In the harsh light of hindsight, lots of folks can see what should (and could) have happened.  

The big question is what did those organizations learn?  Will they make the same mistakes again?  Probably not.  But will they misinterpret future weak or ambiguous signals of a different nascent problem?  That’s still likely.  “Conventional wisdom” codified in various logics and orders and guided by a production imperative remains a strong force working against the open discussion of alternative explanations for new experiences, especially when problem information is incomplete or fuzzy.  As Bob Cudlin noted in his August 17, 2011 post: [When dealing with risk-imbued issues] “the intrinsic uncertainties in significance determination opens the door to the influence of other factors - namely those ever present considerations of cost, schedule, plant availability, and even more personal interests, such as incentive programs and career advancement.”

   
*  C. Perin, Shouldering Risks: The Culture of Control in the Nuclear Power Industry, (Princeton, NJ: Princeton University Press, 2005).

**  The case studies and Perin’s analysis have been greatly summarized for this blog post.

***  The “orders” include outsiders such as NRC, INPO or corporate overseers.  Although this may not be totally accurate, I picture orders as akin to medieval guilds.