Safetymatters: Organizational and safety culture information, analysis and management: Decisions

Showing posts with label Decisions. Show all posts

Wednesday, August 7, 2013

Nuclear Industry Scandal in South Korea

As you know, over the past year trouble has been brewing in the South Korean nuclear industry. A recent New York Times article* provides a good current status report. The most visible problem is the falsification of test documents for nuclear plant parts. Executives have been fired, employees of both a testing company and the state-owned entity that inspects parts and validates their safety certificates have been indicted.

It should be no surprise that the underlying causes are rooted in the industry structure and culture. South Korea has only one nuclear utility, state-owned Korea Electric Power Corporation (Kepco). Kepco retirees go to work for parts suppliers or invest in them. Cultural attributes include valuing personal ties over regulations, and school and hometown connections. Bribery is used as a lubricating agent.

As a consequence, “In the past 30 years, our nuclear energy industry has become an increasingly closed community that emphasized its specialty in dealing with nuclear materials and yet allowed little oversight and intervention,” the government’s Ministry of Trade, Industry and Energy said in a recent report to lawmakers. “It spawned a litany of corruption, an opaque system and a business practice replete with complacency.”

Couldn't happen here, right? I hope not, but the U.S. nuclear industry, while not as closed a system as its Korean counterpart, is hardly an open community. The “unique and special” mantra promotes insular thinking and encourages insiders to view outsiders with suspicion. The secret practices of the industry's self-regulator do not inspire public confidence. A familiar cast of NEI/INPO participants at NRC stakeholder meetings fuels concern over the degree to which the NRC has been captured by industry. Utility business decisions that ultimately killed plants (CR3, Kewaunee, San Onofre) appear to have been made in conference rooms isolated from any informed awareness of worst-case technical/commercial consequences. Our industry has many positive attributes but some others ask us to stop and reflect.

* C. Sang-Hun, “Scandal in South Korea Over Nuclear Revelations,” New York Times (Aug. 3, 2013). Retrieved Aug. 6, 2013.

Tuesday, June 18, 2013

The Incredible Shrinking Nuclear Industry

News last week that the San Onofre units would permanently shutdown - joining Crystal River 3 (CR3) and Kewaunee as the latest early retirees - and filling in the last leg of a nuclear bad news trifecta. This is distressing on many fronts, not the least of which is the loss of jobs for thousands of highly qualified nuclear personnel, and perhaps the suggestion of a larger trend. Almost as distressing is the characterization by NEI that San Onofre is a unique situation - as were CR3 and Kewaunee by the way - and placing primary blame on the NRC.* Really? The more useful question to ponder is what decisions led up to the need for plant closures and whether there is a common denominator?

We can think of one: decisions that failed to adequately account for the “tail” of the risk distribution where outcomes, albeit of low probability, carry high consequences. On this score checking in with Nick Taleb is always instructive. He observes “This idea that in order to make a decision you need to focus on the consequences (which you can know) rather than the probability (which you can’t know) is the central idea of uncertainty.”**

For Kewaunee the decision to purchase the plant with a power purchase agreement (PPA) that extended only for eight years;
For CR3, the decision to undertake cutting the containment with in-house expertise;
For SONGs the decision to purchase and install new design steam generators from a vendor working beyond its historical experience envelope.

Whether the decision makers understood this, or even imagined that their decisions included the potential to lose the plants, the results speak for themselves. These people were in Black Swan and fat tail territory and didn’t realize it. Let’s look at a few details.

Kewaunee

Many commentators at this point are writing off the Kewaunee retirement based on the miracle of low gas prices. Dominion cites gas prices and the inability to acquire additional nuclear units in the upper Midwest to achieve economies of scale. But there is a far greater misstep in the story. When Dominion purchased Kewaunee from Wisconsin Public Service in 2005, a PPA was included as part of the transaction. This is an expected and necessary part of the transaction as it established set prices for the sale of the plant’s output for a period of time. A key consideration in structuring deals such as this is not only the specific pricing terms for the asset and the PPA, but the duration of the PPA. In the case of Kewaunee the PPA ran for only 8 years, through December 2013. After 8 years Dominion would have to negotiate another PPA with the local utilities or others or sell into the market. The question is - when buying an asset with a useful life of 28 years (with grant of the 20 year license extension), why would Dominion be OK with just an 8 year PPA? Perhaps Dominion assumed that market prices would be higher in 8 years and wanted to capitalize on those higher prices. Opponents to the transaction believed this to be the case.*** The prevailing expectation at the time was that demand would continue along with appropriate pricing necessary to accommodate current and planned generating units. But the economic downturn capped demand and left a surplus of baseload. Local utilities faced with the option of negotiating a PPA for Kewaunee - or thinning the field and protecting their own assets - did what was in their interest.

The reality is that Dominion rolled the dice on future power prices. Interestingly, in the same time frame, 2007, the Point Beach units were purchased by NextEra Energy Resources (formerly FPL Energy). In this transaction PPAs were negotiated through the end of the extended license terms of the units, 2030 and 2033, providing the basis for a continuing and productive future.

Crystal River 3

In 2009 Progress Energy undertook a project to replace the steam generators in CR3. As with some other nuclear plants this necessitated cutting into the containment to allow removal of the old generators and placement of the new.

Apparently just two companies, Bechtel and SGT, had managed all the previous 34 steam generator replacement projects at U.S. nuclear power plants. Of those, at least 13 had involved cutting into the containment building. All 34 projects were successful.

For the management portion of the job, Progress got bids from both Bechtel and SGT. The lowest was from SGT but Progress opted to self-manage the project to save an estimated $15 million. During the containment cutting process delamination of concrete occurred in several places. Subsequently an outside engineering firm hired to do the failure analysis stated that cutting the steel tensioning bands in the sequence done by Progress Energy along with removing of the concrete had caused the containment building to crack. Progress Energy disagreed stating the cracks “could not have been predicted”. (See Taleb’s view on uncertainty above.)

“Last year, the PSC endorsed a settlement agreement that let Progress Energy refund $288 million to customers in exchange for ending a public investigation of how the utility broke the nuclear plant.”****

When it came time to assess how to fix the damage, Progress Energy took a far more conservative and comprehensive approach. They engaged multiple outside consultants and evaluated numerous possible repair options. After Duke Energy acquired Progress, Duke engaged an independent, third-party review of the engineering and construction plan developed by Progress. The independent review suggested that the cost was likely to be almost $1.5 billion. However, in the worst-case scenario, it could cost almost $3.5 billion and take eight years to complete. “...the [independent consultant] report concluded that the current repair plan ‘appears to be technically feasible, but significant risks and technical issues still need to be resolved, including the ultimate scope of any repair work.’"***** Ultimately consideration of the potentially huge cost and schedule consequences caused Duke to pull the plug. Taleb would approve.

San Onofre

Southern California Edison undertook a project to replace its steam generators almost 10 years ago. It decided to contract with Mitsubishi Heavy Industries (MHI) to design and construct the generators. This would be new territory for Mitsubishi in terms of the size of the generators and design complexity. Following installation and operation for a period of time, tube leakage occurred due to excessive vibrations. The NRC determined that the problems in the steam generators were associated with errors in MHI's computer modeling, which led to underestimation of thermal hydraulic conditions in the generators.

“Success in developing a new and larger steam generator design requires a full understanding of the risks inherent in this process and putting in place measures to manage these risks….Based upon these observations, I am concerned that there is the potential that design flaws could be inadvertently introduced into the steam generator design that will lead to unacceptable consequences (e.g., tube wear and eventually tube plugging). This would be a disastrous outcome for both of us and a result each of our companies desire to avoid. In evaluating this concern, it would appear that one way to avoid this outcome is to ensure that relevant experience in designing larger sized steam generators be utilized. It is my understanding the Mitsubishi Heavy Industries is considering the use of Westinghouse in several areas related to scaling up of your current steam generator design (as noted above). I applaud your effort in this regard and endorse your attempt to draw upon the expertise of other individuals and company's to improve the likelihood of a successful outcome for this project.”#

Unfortunately these concerns raised by SCE came after letting the contract to Mitsubishi. SCE placed (all of) its hopes on improving the likelihood of a successful outcome at the same time stating that a design flaw would be “disastrous”. They were right about the disaster part.

Take Away

These are cautionary tales on a significant scale. Delving into how such high risk (technical and financial) decisions were made and turned out so badly could provide useful lessons learned. That doesn’t appear likely given the interests of the parties and being inconsistent with the industry predicate of operational excellence.

With regard to our subject of interest, safety culture, the dynamics of safety decisions are subject to similar issues and bear directly on safety outcomes. Recall that in our recent posts on implementing safety culture policy, we proposed a scoring system for decisions that includes the safety significance and uncertainty associated with the issue under consideration. The analog to Taleb’s “central idea of uncertainty” is intentional and necessary. Taleb argues you can’t know the probability of consequences. We don’t disagree but as a “known unknown” we think it is useful for decision makers to recognize how uncertain the significance (consequences) may be and calibrate their decision accordingly.

* “Of course, it’s regrettable...Crystal River is closing, the reasons are easy to grasp, and they are unique to the plant. Even San Onofre, which has also been closed for technical reasons (steam generator problems there), is quite different in specifics and probable outcome. So – unfortunate, yes; a dire pox upon the industry, not so much.” M. Flanagan, "Crystal River: End Days of Nuclear Energy?" NEI Nuclear Notes (Feb. 7, 2013). Retrieved June 17, 2013. For the NEI/SCE perspective on regulatory foot-dragging and uncertainty, see W. Freebairn et al, "SoCal Ed to retire San Onofre nuclear units, blames NRC delays," Platts (June 7, 2013). Retrieved June 17, 2013. And "NEI's Peterson discusses politics surrounding NRC confirmation, San Onofre closure," Environment & Energy Publishing OnPoint (June 17, 2013). Retrieved June 17, 2013.

** N. Taleb, The Black Swan (New York: Random House, 2007), p. 211. See also our post on Taleb dated Nov. 9, 2011.

*** The Customers First coalition that opposed the sale of the plant in 2004 argued: “Until 2013, a complex purchased-power agreement subject to federal jurisdiction will replace PSCW review. After 2013, the plant’s output will be sold at prices that are likely to substantially exceed cost.” Customers First!, "Statement of Position: Proposed Sale of the Kewaunee Nuclear Power Plant April 2004" (April, 2004). Retrieved June 17, 2013.

**** R. Trigaux, "Who's to blame for the early demise of Crystal River nuclear power plant?" Tampa Bay Times (Feb. 5, 2013). Retrieved Jun 17, 2013. We posted on CR3's blunder and unfolding financial mess on Nov. 11, 2011.

***** "Costly estimates for Crystal River repairs," World Nuclear News (Oct. 2, 2012). Retrieved June 17, 2013.

# D.E. Nunn (SCE) to A. Sawa (Mitsubishi), "Replacement Steam Generators San Onofre Nuclear Generating Station, Units 2 & 3" (Nov. 30, 2004). Copy retrieved June 17, 2013 from U.S. Senate Committee on Environment & Public Works, attachment to Sen. Boxer's May 28, 2013 press release.

Wednesday, May 8, 2013

Safety Management and Competitiveness

Jean-Marie Rousseau

We recently came across a paper that should be of significant interest to nuclear safety decision makers. “Safety Management in a Competitiveness Context” was presented in March 2008 by Jean-Marie Rousseau of the Institut de Radioprotection et de Surete Nucleaire (IRSN). As the title suggests the paper examines the effects of competitive pressures on a variety of nuclear safety management issues including decision making and the priority accorded safety. Not surprisingly:

“The trend to ignore or to deny this phenomenon is frequently observed in modern companies.” (p. 7)

The results presented in the paper came about from a safety assessment performed by IRSN to examine safety management of EDF [Electricite de France] reactors including:

“How real is the ‘priority given to safety’ in the daily arbitrations made at all nuclear power plants, particularly with respect to the other operating requirements such as costs, production, and radiation protection or environmental constraints?” (p. 2)

The pertinence is clear as “priority given to safety” is the linchpin of safety culture policy and expected behaviors. In addition the assessment focused on decision-making processes at both the strategic and operational levels. As we have argued, decisions can provide significant insights into how safety culture is operationalized by nuclear plant management.

Rousseau views nuclear operations as a “highly complex socio-technical system” and his paper provides a brief review of historical data where accidents or near misses displayed indications of the impact of competing priorities on safety. The author notes that competitiveness is necessary just as is safety and as such it represents another risk that must be managed at organizational and managerial levels. This characterization is intriguing and merits further reflection particularly by regulators in their pursuit of “risk informed regulation”. Nominally regulators apply a conceptualization of risk that is hardware and natural phenomena centric. But safety culture and competitive pressures also could be justified as risks to assuring safety - in fact much more dynamic risks - and thus be part of the framework of risk informed regulation.* Often, as is the case with this paper, there is some tendency to assert that achievement of safety is coincident with overall performance excellence - which in a broad sense it is - but notwithstanding there are many instances where there is considerable tension - and potential risk.

Perhaps most intriguing in the assessment is the evaluation of EDF’s a posteriori analyses of its decision making processes as another dimension of experience feedback.** We quote the paper at length:

“The study has pointed out that the OSD***, as a feedback experience tool, provides a priori a strong pedagogic framework for the licensee. It offers a context to organize debates about safety and to share safety representations between actors, illustrated by a real problematic situation. It has to be noticed that it is the only tool dedicated to “monitor” the safety/competitiveness relationship.

"But the fundamental position of this tool (“not to make judgment about the decision-maker”) is too restrictive and often becomes “not to analyze the decision”, in terms of results and effects on the given situation.

"As the existence of such a tool is judged positively, it is necessary to improve it towards two main directions: - To understand the factors favouring the quality of a decision-making process. To this end, it is necessary to take into account the decision context elements such as time pressure, fatigue of actors, availability of supports, difficulties in identifying safety requirements, etc.
- To understand why a “qualitative decision-making process” does not always produce a “right decision”. To this end, it is necessary to analyze the decision itself with the results it produces and the effects it has on the situation.” (p. 8)

We feel this is a very important aspect that currently receives insufficient attention. Decisions can provide a laboratory of safety management performance and safety culture actualization. But how often are decisions adequately documented, preserved, critiqued and shared within the organization? Decisions that yield a bad (reportable) result may receive scrutiny internally and by regulators but our studies indicate there is rarely sufficient forensic analysis - cause analyses are almost always one dimensional and hardware and process oriented. Decisions with benign outcomes - whether the result of “good” decision making or not - are rarely preserved or assessed. The potential benefits of detailed consideration of decisions have been demonstrated in many of the independent assessments of accidents (Challenger, Columbia, BP Texas Oil Refinery, etc.) and in research by Perin and others.

We would go a step further than proposed enhancements to the OSD. As Rousseau notes there are downsides to the routine post-hoc scrutiny of actual decisions - for one it will likely identify management errors even in the absence of a bad decision outcome. This would be one more pressure on managers already challenged by a highly complex decision environment. An alternative is to provide managers the opportunity to “practice” making decisions in an environment that supports learning and dialogue on achieving the proper balances in decisions - in other words in a safety management simulator. The industry requires licensed operators to practice operations decisions on a simulator for similar reasons - why not nuclear managers charged with making safety decisions?

* As the IAEA has noted, “A danger of concentrating too much on a quantitative risk value that has been generated by a PSA [probabilistic safety analysis] is that...a well-designed plant can be operated in a less safe manner due to poor safety management by the operator.” IAEA-TECDOC-1436, Risk Informed Regulation of Nuclear Facilities: Overview of the Current Status, February 2005.

** EDF implemented safety-availability-Radiation-Protection-environment observatories (SAREOs) to increase awareness of the arbitration between safety and other performance factors. SAREOs analyze in each station the quality of the decision-making process and propose actions to improve it and to guarantee compliance with rules in any circumstances [“Nuclear Safety: our overriding priority” EDF Group‟s file responding to FTSE4Good nuclear criteria]

*** Per Rousseau, “The OSD (Observatory for Safety/Availability) is one of the “safety management levers” implemented by EDF in 1997. Its objective is to perform retrospective analyses of high-stake decisions, in order to improve decision-making processes.” (p. 7)

Tuesday, November 20, 2012

BP/Deepwater Horizon: Upping the Stakes

Anyone who thought safety culture and safety decision making was an institutional artifact, or mostly a matter of regulatory enforcement, might want to take a close look at what is happening on the BP/Deepwater Horizon front these days. Three BP employees have been criminally indicted - and two of those indictments bear directly on safety in operational decisions. The indictments of the well-site leaders, the most senior BP personnel on the platform, accuses them of causing the deaths of 11 crewmen aboard the Deepwater Horizon rig in April 2010 through gross negligence, primarily by misinterpreting a crucial pressure test that should have alerted them that the well was in trouble.*

The crux of the matter relates to the interpretation of a pressure test to determine whether the well had been properly sealed prior to being temporarily abandoned. Apparently BP’s own investigation found that the men had misinterpreted the test results.

The indictment states, “The Well Site Leaders were responsible for...ensuring that well drilling operations were performed safely in light of the intrinsic danger and complexity of deepwater drilling.” (Indictment p.3)

The following specific actions are cited as constituting gross negligence: “...failed to phone engineers onshore to advise them ...that the well was not secure; failed to adequately account for the abnormal readings during the testing; accepted a nonsensical explanation for the abnormal readings, again without calling engineers onshore to consult…” (Indictment p.7)

The willingness of federal prosecutors to advance these charges should (and perhaps are intended to) send a chill down every manager’s spine in high risk industries. While gross negligence is a relatively high standard, and may or may not be provable in the BP case, the actions cited in the indictment may not sound all that extraordinary - failure to consult with onshore engineers, failure to account for “abnormal” readings, accepting a “nonsensical” explanation. Whether this amounts to “reckless” or willful disregard for a known risk is a matter for the legal system. As an article in the Wall Street Journal notes, “There were no federal rules about how to conduct such a test at the time. That has since changed; federal regulators finalized new drilling rules last week that spell out test procedures.”**

The indictment asserts that the men violated the “standard of care” applicable to the deepwater oil exploration industry. One might ponder what federal prosecutors think the “standard of care” is for the nuclear power generation industry.

Clearly the well site leaders made a serious misjudgment - one that turned out to have catastrophic consequences. But then consider the statement by the Assistant Attorney General, that the accident was caused by “BP’s culture of privileging profit over prudence.” (WSJ article) Are there really a few simple, direct causes of this accident or is this an example of a highly complex system failure? Where does culpability for culture lie? Stay tuned.

* U.S. District Court Eastern District of Louisiana, “Superseding Indictment for Involuntary Manslaughter, Seaman's Manslaughter and Clean Water Act: United States of America v. Robert Kaluza and Donald Vidrine,” Criminal No. 12-265.

** T. Fowler and R. Gold, “Engineers Deny Charges in BP Spill,” Wall Street Journal online (Nov. 18, 2012).

Friday, March 23, 2012

Going Beyond SCART: A More Useful Guidebook for Evaluating Safety Culture

Our March 11 post reviewed the IAEA SCART guidelines. We found its safety culture characteristics and attributes comprehensive but its “guiding questions” for evaluators were thin gruel, especially in the areas we consider critical for safety culture: decision making, corrective action, work backlogs and management incentives.

This post reviews another document that combines the SCART guidelines, other IAEA documents and the author’s insights to yield a much more robust guidebook for evaluating a facility’s safety culture. It’s called “Guidelines for Regulatory Assessment of Safety Culture in Licensees’ Organisations.”* It starts with the SCART characteristics and attributes but gives more guidance to an evaluator: recommendations for documents to review, what to look for during the evaluation, additional (and more critical) guiding questions, and warning signs that can indicate safety culture weaknesses or problems.

Specific guidance in the areas we consider critical is generally more complete. For example, in the area of decision making, evaluators are told to look for a documented process applicable to all matters that affect safety, attend meetings to observe the decision-making process, note the formalization of the decision making process and how/if long-term consequences of decisions are considered. Goal conflict is explicitly addressed, including how differing opinions, conflict based on different experiences, and questioning attitudes are dealt with, and the evidence of fair and impartial methods to resolve conflicts. Interestingly, example conflicts are not limited to the usual safety vs. cost or production but include safety vs. safety, e.g., a proposed change that would increase plant safety but cause additional personnel rad exposure to implement. Evidence of unresolved conflicts is a definite warning flag for the evaluator.

Corrective action (CA) also gets more attention, with questions and flags covering CA prioritization based on safety significance, the timely implementation of fixes, lack of CA after procedure violations or regulatory findings, verification that fixes are implemented and effective, and overall support or lack thereof for the CAP.

Additional questions and flags cover backlogs in maintenance, corrective actions, procedure changes, unanalyzed physical or procedural problems, and training.

However, the treatment of management incentives is still weak, basically the same as the SCART guidelines. We recommend a more detailed evaluation of the senior managers’ compensation scheme or, in more direct language, how much do they get paid for production, and how much for safety?

The intended audience for this document is a regulator charged with assessing a licensee’s safety culture. As we have previously discussed, some regulatory agencies are evaluating this approach. For now, that’s a no-go in the U.S. In any case, these guidelines provide a good checklist for self-assessors, internal auditors and external consultants.

* M. Tronea, “Guidelines for Regulatory Oversight of Safety Culture in Licensees’ Organisations” Draft, rev. 8 (Bucharest, Romania: National Commission for Nuclear Activities Control [CNCAN], April 2011). In addition to being on the staff of CNCAN, the nuclear regulatory authority of Romania, Dr. Tronea is the founder/manager of the LinkedIn Nuclear Safety group.

Wednesday, December 21, 2011

From SCWE to Safety Culture—Time for the Soapbox

Is a satisfactory Safety Conscious Work Environment (SCWE) the same as an effective safety culture (SC)? Absolutely not. However, some of the reports and commentary we’ve seen on troubled facilities appear to mash the terms together. I can’t prove it, but I suspect facilities that rely heavily on lawyers to rationalize their operations are encouraged to try to pass off SCWE as SC. In any case, following is a review of the basic components of SC:

Safety Conscious Work Environment

An acceptable SCWE* is one where employees are encouraged and feel free to raise safety-related issues without fear of retaliation by their employer. Note that it does not necessarily address individual employees’ knowledge of or interest in such issues.

Problem Identification and Resolution (PI&R)

PI&R is usually manifested in a facility’s corrective action program (CAP). An acceptable CAP has a robust, transparent process for evaluating, prioritizing and resolving specific issues. The prioritization step includes an appropriate weight for an issue’s safety-related elements. CAP backlogs are managed to levels that employees and regulators associate with timely resolution of issues.

However, the CAP often only deals with identified issues. Effective organizations must also anticipate problems and develop plans for addressing them. Again, safety must have an appropriate priority.

Organizational Decision Making

The best way to evaluate an organization’s culture, including safety culture, is through an in-depth analysis of a representative sample of key decisions. How did the decision-making process handle competing goals, set priorities, treat devil’s advocates who raised concerns about possible unfavorable outcomes, and assign resources? Were the most qualified people involved in the decisions, regardless of their position or rank? Note that this evaluation should not be limited to situations where the decisions led to unfavorable consequences; after all, most decisions lead to acceptable outcomes. The question here is “How were safety concerns handled in the decision making process, independent of the outcome?”

Management Behavior

What is management’s role in all this? Facility and corporate managers must “walk the talk” as role models demonstrating the importance of safety in all aspects of organizational life. They must provide personal leadership that reinforces safety. They must establish a recognition and reward system that reinforces safety. Most importantly, they must establish and maintain the explicit and implicit weighting factors that go into all decisions. All of these actions reinforce the desired underlying assumptions with respect to safety throughout the organization.

Conclusion

Establishing a sound safety culture is not rocket science but it does require focus and understanding (a “mental model”) of how things work. SCWE, PI&R, Decision Making and Management Behavior are all necessary components of safety culture. Not to put too fine a point on it, but safety culture is a lot more than quoting a survey result that says “workers feel free to ask safety-related questions.”

* SCWE questions have also been raised on the LinkedIn Nuclear Safety and Nuclear Safety Culture discussion forums. Some of the commentary is simple bloviating but there are enough nuggets of fact or insight to make these forums worth following.

Monday, December 5, 2011

Regulatory Assessment of Safety Culture—Not Made in U.S.A.

Last February, the International Atomic Energy (IAEA) hosted a four-day meeting of regulators and licensees on safety culture.* “The general objective of the meeting [was] to establish a common opinion on how regulatory oversight of safety culture can be developed to foster safety culture.” In fewer words, how can the regulator oversee and assess safety culture?

While no groundbreaking new methods for evaluating a nuclear organization’s safety culture were presented, the mere fact there is a perception that oversight methods need to be developed is encouraging. In addition, outside the U.S., it appears more likely that regulators are expected to engage in safety culture oversight if not formal regulation.

Representatives from several countries made presentations. The NRC presentation discussed the then-current status of the effort that led to the NRC safety culture policy statement announced in June. The presentations covering Belgium, Bulgaria, Indonesia, Romania, Switzerland and Ukraine described different efforts to include safety culture assessment into licensee evaluations.

Perhaps the most interesting material was a report on an attendee survey** administered at the start of the meeting. The survey covered “national regulatory approaches used in the oversight of safety culture.” (p.3) 18 member states completed the survey. Following are a few key findings:

The states were split about 50-50 between having and not having regulatory requirements related to safety culture. (p. 7) The IAEA is encouraging regulators to get more involved in evaluating safety culture and some countries are responding to that push.

To minimize subjectivity in safety culture oversight, regulators try to use oversight practices that are transparent, understandable, objective, predictable, and both risk-informed and performance-based. (p. 13) This is not news but it is a good thing; it means regulators are trying to use the same standards for evaluating safety culture as they use for other licensee activities.

Licensee decision-making processes are assessed using observations of work groups, probabilistic risk analysis, and during the technical inspection. (p. 15) This seems incomplete or even weak to us. In-depth analysis of critical decisions is necessary to reveal the underlying assumptions (the hidden, true culture) that shape decision-making.

Challenges include the difficulty in giving an appropriate priority to safety in certain real-time decision making situations and the work pressure in achieving production targets/ keeping to the schedule of outages. (p. 16) We have been pounding the drum about goal conflict for a long time and this survey finding simply confirms that the issue still exists.

Bottom Line

The meeting was generally consistent with our views. Regulators and licensees need to focus on cultural artifacts, especially decisions and decision making, in the short run while trying to influence the underlying assumptions in the long run to reduce or eliminate the potential for unexpected negative outcomes.

* IAEA Technical Meeting on Safety Culture Oversight and Assessment, Vienna, Feb. 15-18, 2011.

** A. Kerhoas, "Synthesis of Questionnaire Survey."

Friday, November 11, 2011

The Mother of Bad Decisions?

This is not about safety culture, but it’s nuclear related and, given our recent emphasis on decision-making, we can’t pass over it without commenting.

The steam generators (SGs) were recently replaced at Crystal River 3. This was a large and complex undertaking but SGs have been successfully replaced at many other plants. The Crystal River project was more complicated because it required cutting an opening in the containment but this, too, has been successfully accomplished at other plants.

The other SG replacements were all managed by two prime contractors, Bechtel and the Steam Generator Team (SGT). However, to save a few bucks, $15 million actually, Crystal River decided to manage the project themselves. (For perspective, the target cost for the prime contractor, exclusive of incentive fee, was $73 million.) (Franke, Exh. JF-32, p. 8)*

Cutting the opening resulted in delamination of the containment, basically the outer 10 inches of concrete separated from the overall 42-inch thick structure in an area near the opening. Repairing the plant and replacement power costs are estimated at more than $2.5 billion.** It’s not clear when the plant will be running again, if ever.

Progress Energy Florida (PEF), the plant owner, says insurance will cover most of the costs. We’ll see. But PEF also wants Florida ratepayers to pay. PEF claims they “managed and executed the SGR [steam generator replacement] project in a reasonable and prudent manner. . . .” (Franke, p. 3)

The delamination resulted from “unprecedented and unpredictable circumstances beyond PEF's control and in spite of PEF's prudent management. . . .” (Franke, p. 2)

PEF’s “root cause investigation determined that there were seven factors that contributed to the delamination. . . . These factors combined to cause the delamination during the containment opening activities in a complex interaction that was unprecedented and unpredictable.” [emphasis added] (Franke, p. 27)***

This is an open docket, i.e., the Florida PSC has not yet determined how much, if anything, the ratepayers will have to pay. Will the PSC believe that a Black Swan settled at the Crystal River plant? Or is the word “hubris” more likely to come to mind?

* “Testimony & Exhibits of Jon Franke,” Fla. Public Service Commission Docket No. 100437-EI (Oct. 10, 2011).

** I. Penn, “Cleaning up a DIY repair on Crystal River nuclear plant could cost $2.5 billion,” St. Petersburg Times via tampabay.com website (Oct. 9, 2011). This article provides a good summary of the SG replacement project.

*** For the detail-oriented, “. . . the technical root cause of the CR3 wall delamination was the combination of: 1) tendon stresses; 2) radial stresses; 3) industry design engineering analysis inadequacies for stress concentration factors; 4) concrete strength properties; 5) concrete aggregate properties; and 6) the de-tensioning sequence and scope. . . . another factor, the process of removing the concrete itself, likely contributed to the extent of the delamination. . . .” From “Testimony & Exhibits of Garry Miller,” Fla. Public Service Commission Docket No. 100437-EI (Oct. 10, 2011), p. 5.

Monday, September 12, 2011

Understanding the Risks in Managing Risks

Our recent blog posts have discussed the work of anthropologist Constance Perin. This post looks at her book, Shouldering Risks: The Culture of Control in the Nuclear Power Industry.* The book presents four lengthy case studies of incidents at three nuclear power plants and Perin’s analysis which aims to explain the cultural attributes that facilitated the incidents’ occurrence or their unfavorable evolution.

Because they fit nicely with our interest in decision-making, this post will focus on the two case studies that concerned hardware issues.** The first case involved a leaking, unisolable valve in the reactor coolant system (RCS) that needed repacking, a routine job. The mechanics put the valve on its backseat, opened it, observed the packing moving up (indicating that the water pressure was too high or the backseat step hadn't worked), and closed it up. After management meetings to review the situation, the mechanics tried again, packing came out, and the leak became more serious. The valve stem and disc had separated, a fact that was belatedly recognized. The leak was eventually sufficiently controlled so the plant could wait until the next outage to repair/replace the valve.

The second case involved a switchyard transformer that exhibited a hot spot during a thermography examination. Managers initially thought they had a circulating current issue, a common problem. After additional investigations, including people climbing on ladders up alongside the transformer, a cover bolt was removed and the employee saw a glow inside the transformer, the result of a major short. Transformers can, and have, exploded from such thermal stresses but the plant was able to safely shut down to repair/replace the transformer.

In both cases, there was at least one individual who knew (or strongly suspected) that something more serious was wrong from the get-go but was unable to get the rest of the organization to accept a more serious, i.e., costly, diagnosis.

Why were the plant organizations so willing, even eager, to assume the more conventional explanations for the problems they were seeing? Perin provides a multidimensional framework that helps answer that question.

The first dimension is the tradeoff quandary, the ubiquitous tension between production and cost, including costs associated with safety. Plant organizations are expected to be making electricity, at a budgeted cost, and that subtle (or not-so-subtle) pressure colors the discussion of any problem. There is usually a preference for a problem explanation and corrective action that allows the plant to continue running.

Three control logics constitute a second dimension. The calculated logics are the theory of how a plant is (or should be) designed, built, and operated. The real-time logics consist of the knowledge of how things actually work in practice. Policy logics come from above, and represent generalized guidelines or rules for behavior, including decision-making. An “answer” that comes from calculated or policy logic will be preferred over one that comes from real-time logic, partly because the former have been developed by higher-status groups and partly because such answers are more defensible to corporate bosses and regulators.

Finally, traditional notions of group and individual status and a key status property, credibility, populate a third dimension: design engineers over operators over system engineers over maintenance over others; managers over individual contributors; old-timers over newcomers. Perin creates a construct of the various "orders"*** in a plant organization, specialists such as operators or system engineers. Each order has its own worldview, values and logics – optimum conditions for nurturing organizational silos. Information and work flows are mediated among different orders via plant-wide programs (themselves products of calculated and policy logics).

Application to Cases

The aforementioned considerations can be applied to the two cases. Because the valve was part of the RCS, it should have been subject to more detailed planning, including additional risk analysis and contingency prep. This was pointed out by a new-to-his-job work planner who was basically ignored because of his newcomer status. And before the work was started, the system engineer (SE) observed that this type of valve (which had a problem history at this plant and elsewhere) was prone to valve disk/stem separation and this particular valve appeared to have the problem based on his visual inspection (it had one thread less visible than other similar valves). But the SE did not make his observations forcefully and/or officially (by initiating a CR) so his (accurate) observation was not factored into the early decision-making. Ultimately, their concerns did not sway the overall discussion where the schedule was highest priority. A radiographic examination that would have shown the valve/disc separation was not performed early on because that was an Engineering responsibility and the valve repair was a Maintenance project.

The transformer is on the non-nuclear side of the plant, which makes the attitudes toward it less focused and critical than for safety-related equipment. The hot spot was discovered by a tech who was working with a couple of thermography consultants. Thermography was a relatively new technology at this plant and not well-understood by plant managers (or trusted because early applications had given false alarms). The tech said that the patterns he observed were not typical for circulating currents but neither he nor the consultants (the three people on-site who understood thermography) were in the meetings where the problem was discussed. The circulating current theory was popular because (a) the plant had experienced such problems in the past and (b) addressing it could be done without shutting down the plant. Production pressure, the nature of past problems, and the lower status of roles and equipment that are not safety related all acted to suppress the emergent new knowledge of what the problem actually was.

Lessons Learned

Perin’s analytic constructs are complicated and not light reading. However, the interviews in the case studies are easy to read and very revealing. It will come as no surprise to people with consulting backgrounds that the interviewees were capable of significant introspection. In the harsh light of hindsight, lots of folks can see what should (and could) have happened.

The big question is what did those organizations learn? Will they make the same mistakes again? Probably not. But will they misinterpret future weak or ambiguous signals of a different nascent problem? That’s still likely. “Conventional wisdom” codified in various logics and orders and guided by a production imperative remains a strong force working against the open discussion of alternative explanations for new experiences, especially when problem information is incomplete or fuzzy. As Bob Cudlin noted in his August 17, 2011 post: [When dealing with risk-imbued issues] “the intrinsic uncertainties in significance determination opens the door to the influence of other factors - namely those ever present considerations of cost, schedule, plant availability, and even more personal interests, such as incentive programs and career advancement.”

* C. Perin, Shouldering Risks: The Culture of Control in the Nuclear Power Industry, (Princeton, NJ: Princeton University Press, 2005).

** The case studies and Perin’s analysis have been greatly summarized for this blog post.

*** The “orders” include outsiders such as NRC, INPO or corporate overseers. Although this may not be totally accurate, I picture orders as akin to medieval guilds.

Wednesday, August 17, 2011

Additional Thoughts on Significance Culture

Our previous post introduced the work of Constance Perin, Visiting Scholar in Anthropology at MIT, including her thesis of “significance culture” in nuclear installations. Here we expand on the intersection of her thesis with some of our work.

Perin places primary emphasis on the availability and integration of information to systematize and enhance the determination of risk significance. This becomes the true organizing principle of nuclear operational safety and supplants the often hazy construct of safety culture. We agree with the emphasis on more rigorous and informed assessments of risk as an organizing principle and focus for the entire organization.

Perin observes: “Significance culture arises out of a knowledge-using and knowledge-creating paradigm. Its effectiveness depends less on “management emphasis” and “personnel attitudes” than on having an operational philosophy represented in goals, policies, priorities, and actions organized around effectively characterizing questionable conditions before they can escalate risk.” (Significance Culture, p. 3)*

We found a similar thought from Kenneth Brawn on a recent LinkedIn post under the Nuclear Safety Group. He states, “Decision making, and hence leadership, is based on accurate data collection that is orchestrated, focused, real time and presented in a structured fashion for a defined audience….Managers make decisions based on stakeholder needs – the problem is that risk is not adequately considered because not enough time is taken (given) to gather and orchestrate the necessary data to provide structured information for the real time circumstances.” **

While seeing the potential unifying force of significance culture, we are mindful also that such determinations often are made under a cloak of precision that is not warranted or routinely achievable. Such analyses are complex, uncertain, and subject to considerable judgment by the involved analysts and decision makers. In other words, they are inherently fuzzy. This limitation can only be partly remedied through better availability of information. Nuclear safety does not generally include “bright lines” of acceptable or unacceptable risks, or finely drawn increments of risk. Sure, PRA analyses and other “risk informed” approaches provide the illusion of quantitative precision, and often provide useful insight for devising courses of action that that do not pose “undue risk” to public safety. But one does not have to read too many Licensee Event Reports (LERs) to see that risk determinations are ultimately shades of gray. For one example, see the background information on our decision scoring example involving a pipe leak in a 30” moderate energy piping elbow and interim repair. The technical justification for the interim fix included terms such as “postulated”, “best estimate” and “based on the assumption”. A full reading of the LER makes clear the risk determination involved considerable qualitative judgment by the licensee in making its case and the NRC in approving the interim measure. That said, the NRC’s justification also rested in large part on a finding of “hardship or unusual difficulty” if a code repair were to be required immediately.

Where is this leading us? Are poor safety decisions the result of the lack of quality information? Perhaps. However another scenario that is at least equally likely, is that the appropriate risk information may not be pursued vigorously or the information may be interpreted in the light most favorable to the organization’s other priorities. We believe that the intrinsic uncertainties in significance determination opens the door to the influence of other factors - namely those ever present considerations of cost, schedule, plant availability, and even more personal interests, such as incentive programs and career advancement. Where significance is fuzzy, it invites rationalization in the determination of risk and marginalization of the intrinsic uncertainties. Thus a desired decision outcome could encourage tailoring of the risk determination to achieve the appropriate fit. It may mean that Perin’s focus on “effectively characterizing questionable conditions” must also account for the presence and potential influence of other non-safety factors as part of the knowledge paradigm.

This brings us back to Perin’s ideas for how to pull the string and dig deeper into this subject. She finds, “Condition reports and event reviews document not only material issues. Uniquely, they also document systemic interactions among people, priorities, and equipment — feedback not otherwise available.” (Significance Culture, p.5) This emphasis makes a lot of sense and in her book, Shouldering Risks: The Culture of Control in the Nuclear Power Industry, she takes up the challenge of delving into the depths of a series of actual condition reports. Stay tuned for our review of the book in a subsequent post.

* C. Perin, “Significance Culture in Nuclear Installations,” a paper presented at the 2005 Annual Meeting of the American Nuclear Society (June 6, 2005).

** You may be asked to join the LinkedIn Nuclear Safety group to view Mr. Brawn's comment and the discussion of which it is part.

Friday, August 12, 2011

An Anthropologist’s View

Academics in many disciplines study safety culture. This post introduces to this blog the work of an MIT anthropologist, Constance Perin, and discusses a paper* she presented at the 2005 ANS annual meeting.

We picked a couple of the paper’s key recommendations to share with you. First, Perin’s main point is to advocate the development of a “significance culture” in nuclear power plant organizations. The idea is to organize knowledge and data in a manner that allows an organization to determine significance with respect to safety issues. The objective is to increase an organization’s capabilities to recognize and evaluate questionable conditions before they can escalate risk. We generally agree with this aim. The real nub of safety culture effectiveness is how it shapes the way an organization responds to new or changing situations.

Perin understands that significance evaluation already occurs in both formal processes (e.g., NRC evaluations and PRAs) and in the more informal world of operational decisions, where trade-offs, negotiations, and satisficing behavior may be more dynamic and less likely to be completely rational. She recommends that significance evaluation be ascribed a higher importance, i.e., be more formally and widely ingrained in the overall plant culture, and used as an organizing principle for defining knowledge-creating processes.

Second, because of the importance of a plant's Corrective Action Program (CAP), Perin proposes making NRC assessment of the CAP the “eighth cornerstone” of the Reactor Oversight Process (ROP). She criticizes the NRC’s categorization of cross cutting issues for not being subjected to specific criteria and performance indicators. We have a somewhat different view. Perin’s analysis does not acknowledge that the industry places great emphasis on each of the cross cutting issues in terms of performance indicators and monitoring including self assessment.** It is also common to the other cornerstones where the plants use many more indicators to track and trend performance than the few included in the ROP. In our opinion, a real problem with the ROP is that its few indicators do not provide any reliable or forward looking picture of nuclear safety.

The fault line in the CAP itself may better be characterized in terms of the lack of measurement and assessment of how well the CAP program functions to sustain a strong safety culture. Importantly such an approach would evaluate how decisions on conditions adverse to quality properly assessed not only significance, but balanced the influence of any competing priorities. Perin also recognizes that competing priorities exist, especially in the operational world, but making the CAP a cornerstone might actually lead to increased false confidence in the CAP if its relationship with safety culture was left unexamined.

Prof. Perin has also written a book, Shouldering Risks: The Culture of Control in the Nuclear Power Industry,*** which is an ethnographic analysis of nuclear organizations and specific events they experienced. We will be reviewing this book in a future post. We hope that her detailed drill down on those events will yield some interesting insights, e.g., how different parts of an organization looked at the same situation but had differing evaluations of its risk implications.

We have to admit we didn’t detect Prof. Perin on our radar screen; she alerted us to the presence of her work. Based on our limited review to date, we think we share similar perspectives on the challenges involved in attaining and maintaining a robust safety culture.

* C. Perin, “Significance Culture in Nuclear Installations,” a paper presented at the 2005 Annual Meeting of the American Nuclear Society (June 6, 2005).

** The issue may be one of timing. Prof. Perin based her CAP recommendation, in part, on a 2001 study that suggested licensees’ self-regulation might be inadequate. We have the benefit of a more contemporary view.

*** C. Perin, Shouldering Risks: The Culture of Control in the Nuclear Power Industry, (Princeton, NJ: Princeton University Press, 2005).

Friday, June 24, 2011

Rigged Decisions?

The Wall Street Journal reported on June 23, 2011* on an internal investigation conducted by Transocean, owner of the Deepwater Horizon drill rig, that placed much of the blame for the disaster on a series of decisions made by BP. Is this news? No, the blame game has been in full swing almost since the time of the rig explosion. But we did note that Transocean’s conclusion was based on a razor sharp focus on:

“...a succession of interrelated well design, construction, and temporary abandonment decisions that compromised the integrity of the well and compounded the risk of its failure…”** (p. 10)

Note, their report did not place the focus on the “attitudes, beliefs or values” of BP personnel or rig workers, and really did not let their conclusions drift into the fuzzy answer space of “safety culture”. In fact the only mention of safety culture in their 200+ page report is in reference to a U.S. Coast Guard (USCG) inspection of the drill rig in 2009 which found:

“outstanding safety culture, performance during drills and condition of the rig.” (p. 201)

There is no mention of how the USCG reached such a conclusion and the report does not rely on it to support its conclusions. It would not be the first time that a favorable safety culture assessment at a high risk enterprise preceded a major disaster.***

We also found the following thread in the findings that reinforce the importance of recognizing and understanding the impact of underlying constraints on decisions:

“The decisions, many made by the operator, BP, in the two weeks leading up to the incident, were driven by BP’s knowledge that the geological window for safe drilling was becoming increasingly narrow.” (p.10)

The fact is, decisions get squeezed all the time resulting in decisions which may be reducing margins but arguably are still “acceptable”. But such decisions do not necessarily lead to unsafe, much less disastrous, results. Most of the time the system is not challenged, nothing bad happens, and you could even say the marginal decisions are reinforced. Are these tradeoffs to accommodate conflicting priorities the result of a weakened safety culture? Perhaps. But we suspect that the individuals making the decisions would say they believed safety was their priority and culture may have appeared normal to outsiders as well (e.g., the USCG). The paradox occurs because decisions can trend in a weaker direction before other, more distinct evidence of degrading culture become apparent. In this case, a very big explosion.

* B. Casselman and A. Gonzalez, "Transocean Puts Blame on BP for Gulf Oil Spill," wsj.com (June 23, 2011).

** "Macondo Well Incident: Transocean Investigation Report," Vol I, Transocean, Ltd. (June 2011).

*** For example, see our August 2, 2010 post.

Safetymatters: Organizational and safety culture information, analysis and management