Wednesday, December 21, 2011

From SCWE to Safety Culture—Time for the Soapbox

Is a satisfactory Safety Conscious Work Environment (SCWE) the same as an effective safety culture (SC)?  Absolutely not.  However, some of the reports and commentary we’ve seen on troubled facilities appear to mash the terms together.  I can’t prove it, but I suspect facilities that rely heavily on lawyers to rationalize their operations are encouraged to try to pass off SCWE as SC.  In any case, following is a review of the basic components of SC:

Safety Conscious Work Environment

An acceptable SCWE* is one where employees are encouraged and feel free to raise safety-related issues without fear of retaliation by their employer.  Note that it does not necessarily address individual employees’ knowledge of or interest in such issues.

Problem Identification and Resolution (PI&R)

PI&R is usually manifested in a facility’s corrective action program (CAP).  An acceptable CAP has a robust, transparent process for evaluating, prioritizing and resolving specific issues.  The prioritization step includes an appropriate weight for an issue’s safety-related elements.  CAP backlogs are managed to levels that employees and regulators associate with timely resolution of issues.

However, the CAP often only deals with identified issues.  Effective organizations must also anticipate problems and develop plans for addressing them.  Again, safety must have an appropriate priority.

Organizational Decision Making

The best way to evaluate an organization’s culture, including safety culture, is through an in-depth analysis of a representative sample of key decisions.  How did the decision-making process handle competing goals, set priorities, treat devil’s advocates who raised concerns about possible unfavorable outcomes, and assign resources?  Were the most qualified people involved in the decisions, regardless of their position or rank?  Note that this evaluation should not be limited to situations where the decisions led to unfavorable consequences; after all, most decisions lead to acceptable outcomes.  The question here is “How were safety concerns handled in the decision making process, independent of the outcome?”

Management Behavior

What is management’s role in all this?  Facility and corporate managers must “walk the talk” as role models demonstrating the importance of safety in all aspects of organizational life.  They must provide personal leadership that reinforces safety.  They must establish a recognition and reward system that reinforces safety.  Most importantly, they must establish and maintain the explicit and implicit weighting factors that go into all decisions.  All of these actions reinforce the desired underlying assumptions with respect to safety throughout the organization. 


Establishing a sound safety culture is not rocket science but it does require focus and understanding (a “mental model”) of how things work.  SCWE, PI&R, Decision Making and Management Behavior are all necessary components of safety culture.  Not to put too fine a point on it, but safety culture is a lot more than quoting a survey result that says “workers feel free to ask safety-related questions.”

*  SCWE questions have also been raised on the LinkedIn Nuclear Safety and Nuclear Safety Culture discussion forums.  Some of the commentary is simple bloviating but there are enough nuggets of fact or insight to make these forums worth following.

Thursday, December 8, 2011

Nuclear Industry Complacency: Root Causes

NRC Chairman Jaczko, addressing the recent INPO CEO conference, warned about possible increasing complacency in the nuclear industry.*  To support his point, he noted the two plants in column four of the ROP Action Matrix and two plants in column three, the increased number of special inspections in the past year, and the three units in extended shutdowns.  The Chairman then moved on to discuss other industry issues. 

The speech spurred us to ask: Why does the risk of complacency increase over time?  Given our interest in analyzing organizational processes, it should come as no surprise that we believe complacency is more complicated than the lack of safety-related incidents leading to reduced attention to safety.

An increase in complacency means that an organization’s safety culture has somehow changed.  Causes of such change include shifts in the organization’s underlying assumptions and decay.

Underlying Assumptions

We know from the Schein model that underlying assumptions are the bedrock for culture.  One can take those underlying assumptions and construct an (incomplete) mental model of the organization—what it values, how it operates and how it makes decisions.  Over time, as the organization builds an apparently successful safety record, the mental weights that people assign to decision factors can undergo a subtle but persistent shift to favor the visible production and cost goals over the inherently invisible safety factor.  At the same time, opportunities exist for corrosive issues, e.g., normalization of deviance, to attach themselves to the underlying assumptions.  Normalization of deviance can manifest anywhere, from slipping maintenance standards to a greater tolerance for increasing work backlogs.


An organization’s safety culture will inevitably decay over time absent effective maintenance.  In part this is caused by the shift in underlying assumptions.  In addition, decay results from saturation effects.  Saturation occurs because beating people over the head with either the same thing, e.g., espoused values, or too many different things, e.g., one safety program or similar intervention after another, has lower and lower marginal effectiveness over time.  That’s one reason new leaders are brought in to “problem” plants: to boost the safety culture by using a new messenger with a different version of the message, reset the decision making factor weights and clear the backlogs.

None of this is new to regular readers of this blog.  But we wanted to gather our ideas about complacency in one post.  Complacency is not some free-floating “thing,” it is an organizational trait that emerges because of multiple dynamics operating below the level of clear visibility or measurement.  

*  G.B. Jaczko, Prepared Remarks at the Institute of Nuclear Power Operations CEO Conference, Atlanta, GA (Nov. 10, 2011), p. 2, ADAMS Accession Number ML11318A134.

Monday, December 5, 2011

Regulatory Assessment of Safety Culture—Not Made in U.S.A.

Last February, the International Atomic Energy (IAEA) hosted a four-day meeting of regulators and licensees on safety culture.*  “The general objective of the meeting [was] to establish a common opinion on how regulatory oversight of safety culture can be developed to foster safety culture.”  In fewer words, how can the regulator oversee and assess safety culture?

While no groundbreaking new methods for evaluating a nuclear organization’s safety culture were presented, the mere fact there is a perception that oversight methods need to be developed is encouraging.  In addition, outside the U.S., it appears more likely that regulators are expected to engage in safety culture oversight if not formal regulation.

Representatives from several countries made presentations.  The NRC presentation discussed the then-current status of the effort that led to the NRC safety culture policy statement announced in June.  The presentations covering Belgium, Bulgaria, Indonesia, Romania, Switzerland and Ukraine described different efforts to include safety culture assessment into licensee evaluations.

Perhaps the most interesting material was a report on an attendee survey** administered at the start of the meeting.  The survey covered “national regulatory approaches used in the oversight of safety culture.” (p.3) 18 member states completed the survey.  Following are a few key findings:

The states were split about 50-50 between having and not having regulatory requirements related to safety culture. (p. 7)  The IAEA is encouraging regulators to get more involved in evaluating safety culture and some countries are responding to that push.

To minimize subjectivity in safety culture oversight, regulators try to use oversight practices that are transparent,  understandable, objective, predictable, and both risk-informed and performance-based. (p. 13)  This is not news but it is a good thing; it means regulators are trying to use the same standards for evaluating safety culture as they use for other licensee activities.

Licensee decision-making processes are assessed using observations of work groups, probabilistic risk analysis, and during the technical inspection. (p. 15)  This seems incomplete or even weak to us.  In-depth analysis of critical decisions is necessary to reveal the underlying assumptions (the hidden, true culture) that shape decision-making.

Challenges include the difficulty in giving an appropriate priority to safety in certain real-time decision making situations and the work pressure in achieving production targets/ keeping to the schedule of outages. (p. 16)  We have been pounding the drum about goal conflict for a long time and this survey finding simply confirms that the issue still exists.

Bottom Line

The meeting was generally consistent with our views.  Regulators and licensees need to focus on cultural artifacts, especially decisions and decision making, in the short run while trying to influence the underlying assumptions in the long run to reduce or eliminate the potential for unexpected negative outcomes.

**  A. Kerhoas, "Synthesis of Questionnaire Survey."

Wednesday, November 23, 2011

Lawyering Up

When concerns are raised about the safety culture of an organization with very significant safety responsibilities what’s one to do?  How about, bring in the lawyers.  That appears to be the news out of the Vit Plant* in Hanford, WA.  With considerable fanfare Bechtel unveiled a new website committed to their management of the vit plant.  The site provides an array of policies, articles, reports, and messages regarding safety and quality.

One of the major pieces of information on the site is a recent assessment of the state of safety culture at the vit plant.**  The conclusion of the assessment is quite positive: “Overall, we view the results from this assessment as quite strong, and similar to prior assessments conduct [sic] by the Project.” (p. 16)  The prior assessments were the 2008 and 2009 Vit Plant Opinion Surveys.

However our readers may also recall that earlier this year the Defense Nuclear Facilities Safety Board (DNFSB) issued its report that at the safety culture at the WTP plant is “flawed”.  In a previous post we quoted from the DNFSB report as follows:

“The HSS [DOE's Office of Health, Safety and Security] review of the safety culture on the WTP project 'indicates that BNI [Bechtel National Inc.] has established and implemented generally effective, formal processes for identifying, documenting, and resolving nuclear safety, quality, and technical concerns and issues raised by employees and for managing complex technical issues.'  However, the Board finds that these processes are infrequently used, not universally trusted by the WTP project staff, vulnerable to pressures caused by budget or schedule [emphasis added], and are therefore not effective.”

Thus the DNFSB clearly has a much different view of the state of safety culture at the vit plant than does DOE or Bechtel.  We note that the DNFSB report does not appear to be one of the numerous references available at the new website.  Links to the original DOE report and the recent assessment are provided.  There is also a November 17, 2011 message to all employees from Frank Russo, Project Director*** which introduces and summarizes the 2011 Opinion Survey on the project’s nuclear safety and quality culture (NSQC).  Neither the recent assessment nor the opinion survey addresses the issues raised by the DNFSB; it is as if the DNFSB review never happened.

What really caught our attention in the recent assessment is who wrote the report - a law firm.  Their assessment was based on in-depth interviews of 121 randomly selected employees using a 19 question protocol (the report states that the protocol is attached however it is not part of the web link).  But the law firm did not actually conduct the interviews - “investigators” from the BSII internal audit department did so and took notes that were then provided to the lawyers.  This may give new meaning to the concept of “defense in depth”.

The same law firm also analyzed the results from the 2011 Opinion Survey.  In the message to employees from , Russo asserts that the law firm has “substantial experience in interpreting [emphasis added] NSQC assessments”.  He goes on to say that the questions for the survey were developed by the WTP Independent Safety and Quality Culture Assessment (ISQCA) Team.  In our view, this executive level team has without question “substantial experience” in safety culture.  Supposedly the ISQCA team was tasked with assessing the site’s culture - why then did they only develop the questions and a law firm interpret the answers?  Strikes us as very odd. 

We don’t know the true state of safety culture at the vit plant and unfortunately, the work sponsored by vit plant management does little to provide such insight or to fully vet and respond to the serious deficiencies cited in the DNFSB assessment.  If we were employees at the plant we would be anxious to hear directly from the ISQCA team. 

Reading the law firm report provides little comfort.  We have commented many times about the inherent limitations of surveys and interviews to solicit attitudes and perceptions.  When the raw materials are interview notes of a small fraction of the employees, and assessed by lawyers who were not present in the interviews, we become more skeptical.  Several quotes from the report related to the Employee Concerns Program illustrate our concern.

“The overwhelming majority of interviewees have never used ECP. Only 6.5% of the interviewees surveyed had ever used the program.  [Note: this means a total of nine interviewees.] There is a major difference between the views of interviewees with no personal experience with ECP and those who have used the program: the majority of the interviewees who have not used the program have a positive impression of the program, while more than half of the interviewees who have used the program have a negative impression of it.” (p. 5, emphasis added)

Our favorite quote out of the report is the following.  “Two interviewees who commented on the [ECP] program appear to have confused it with Human Resources.” (p. 6)  One only wonders if the comments were favorable.

Eventually the report gets around to a conclusion that we probably could not say any better.  “We recognize that an interview population of nine employees who have used the ECP in the past is insufficient to draw any meaningful conclusions about the program.” (p. 17)

We’re left with the following question: Why go about an assessment of safety culture in such an obtuse manner, one that is superficial in its “interpretation” of very limited data,  laden with anecdotal material, and ultimately over reaching in its conclusions?

*  The "Vit Plant" is the common name for the Hanford Waste Treatment Plant (WTP).

**  Pillsbury Winthrop Shaw Pittman, LLP, "Assessment of a Safety Conscious Work Environment at the Hanford Waste Treatment Plant" (undated).  The report contains no information on when the interviews or analysis were performed.  Because a footnote refers to the 2009 Opinion Survey and a report addendum refers to an October, 2010 DOE report, we assume the assessment was performed in early-to-mid 2010.

*** WTP Comm, "Message from Frank: 2011 NSQC Employee Survey Results" (Nov. 17, 2011).  

Friday, November 11, 2011

The Mother of Bad Decisions?

This is not about safety culture, but it’s nuclear related and, given our recent emphasis on decision-making, we can’t pass over it without commenting.

The steam generators (SGs) were recently replaced at Crystal River 3.  This was a large and complex undertaking but SGs have been successfully replaced at many other plants.  The Crystal River project was more complicated because it required cutting an opening in the containment but this, too, has been successfully accomplished at other plants.

The other SG replacements were all managed by two prime contractors, Bechtel and the Steam Generator Team (SGT).  However, to save a few bucks, $15 million actually, Crystal River decided to manage the project themselves.  (For perspective, the target cost for the prime contractor, exclusive of incentive fee, was $73 million.)  (Franke, Exh. JF-32, p. 8)*
Cutting the opening resulted in delamination of the containment, basically the outer 10 inches of concrete separated from the overall 42-inch thick structure in an area near the opening.  Repairing the plant and replacement power costs are estimated at more than $2.5 billion.**  It’s not clear when the plant will be running again, if ever.

Progress Energy Florida (PEF), the plant owner, says insurance will cover most of the costs.  We’ll see.  But PEF also wants Florida ratepayers to pay.  PEF claims they “managed and executed the SGR [steam generator replacement] project in a reasonable and prudent manner. . . .”  (Franke, p. 3)

The delamination resulted from “unprecedented and unpredictable circumstances beyond PEF's control and in spite of PEF's prudent management. . . .” (Franke, p. 2)

PEF’s “root cause investigation determined that there were seven factors that contributed to the delamination. . . . These factors combined to cause the delamination during the containment opening activities in a complex interaction that was unprecedented and unpredictable.” [emphasis added]  (Franke, p. 27)***

This is an open docket, i.e., the Florida PSC has not yet determined how much, if anything, the ratepayers will have to pay.  Will the PSC believe that a Black Swan settled at the Crystal River plant?  Or is the word “hubris” more likely to come to mind?

* “Testimony & Exhibits of Jon Franke,” Fla. Public Service Commission Docket No. 100437-EI (Oct. 10, 2011).

**  I. Penn, “Cleaning up a DIY repair on Crystal River nuclear plant could cost $2.5 billion,” St. Petersburg Times via website (Oct. 9, 2011).  This article provides a good summary of the SG replacement project.

***  For the detail-oriented, “. . . the technical root cause of the CR3 wall delamination was the combination of: 1) tendon stresses; 2) radial stresses; 3) industry design engineering analysis inadequacies for stress concentration factors; 4) concrete strength properties; 5) concrete aggregate properties; and 6) the de-tensioning sequence and scope. . . . another factor, the process of removing the concrete itself, likely contributed to the extent of the delamination. . . .” From “Testimony & Exhibits of Garry Miller,” Fla. Public Service Commission Docket No. 100437-EI (Oct. 10, 2011), p. 5.

Wednesday, November 9, 2011

Ultimate Bonuses

Just when you think there is a lack of humor in the exposition of dry, but critical issues, such as risk management, our old friend Nicholas Taleb comes to the rescue.*  His op-ed piece in the New York Times** earlier this week has a subdued title, “End Bonuses for Bankers”, but includes some real eye-openers.  For example Taleb cites (with hardly concealed admiration) the ancient Hammurabi code which protected home owners by calling for the death of the home builder if the home collapsed and killed the owner.  Wait, I thought we were talking about bonuses, not capital punishment.

What Taleb is concerned about is that bonus systems in entities that pose systemic risks almost universally encourage behaviors that may not be consistent with the public good much less the long term health of the business entity.  In short he believes that bonuses provide an incentive to take risks.***  He states, “The asymmetric nature of the bonus (an incentive for success without a corresponding disincentive for failure) causes hidden risks to accumulate in the financial system and become a catalyst for disaster.”  Now just substitute “nuclear operations” for “the financial system”. 

Central to Taleb’s thesis is his belief that management has a large informational advantage over outside regulators and will always know more about risks being taken within their operation.  It affords management the opportunity to both take on additional risk (say to meet an incentive plan goal) and to camouflage the latent risk from regulators.

In our prior posts [here, here and here] on management incentives within the nuclear industry, we also pointed to the asymmetry of bonus metrics - the focus on operating availability and costs, the lack of metrics for safety performance, and the lack of downside incentive for failure to meet safety goals.  The concern was amplified due to the increasing magnitude of nuclear executive bonuses, both in real terms and as a percentage of total compensation. 

So what to do?  Taleb’s answer for financial institutions too big to fail is “bonuses and bailouts should never mix”; in other words, “end bonuses for bankers”.  Our answer is, “bonuses and nuclear safety culture should never mix”; “end bonuses for nuclear executives”.  Instead, gross up the compensation of nuclear executives to include the nominal level of expected bonuses.  Then let them manage nuclear operations using their best judgment to assure safety, unencumbered by conflicting incentives.

*  Taleb is best known for The Black Swan, a book focusing on the need to develop strategies, esp. financial strategies, that are robust in the face of rare and hard-to-predict events.

**  N. Taleb, “End Bonuses for Bankers,” New York Times website (Nov. 7, 2011).

*** It is widely held that the 2008 financial crisis was exacerbated, if not caused, by executives making more risky decisions than shareholders would have thought appropriate. Alan Greenspan commented: “I made a mistake in presuming that the self-interests of organizations, specifically banks and others, were such that they were best capable of protecting their own shareholders” (Testimony to Congress, quoted in A. Clark and J. Treanor, “Greenspan - I was wrong about the economy. Sort of,” The Guardian, Oct. 23, 2008). The cause is widely thought to be the use of bonuses for performance combined with limited liability.  See also J.M. Malcomson, “Do Managers with Limited Liability Take More Risky Decisions? An Information Acquisition Model”, Journal of Economics & Management Strategy, Vol. 20, Issue 1 (Spring 2011), pp. 83–120.

Friday, November 4, 2011

A Factory for Producing Decisions

The subject of this post is the compelling insights of Daniel Kahneman into issues of behavioral economics and how we think and make decisions.  Kahneman is one of the most influential thinkers of our time and a Nobel laureate.  Two links are provided for our readers who would like additional information.  One is via the McKinsey Quarterly, a video interview* done several years ago.  It runs about 17 minutes.  The second is a current review in The Atlantic** of Kahneman’s just released book, Thinking Fast and Slow.

Kahneman begins the McKinsey interview by suggesting that we think of organizations as “factories for producing decisions” and therefore, think of decisions as a product.  This seems to make a lot of sense when applied to nuclear operating organizations - they are the veritable “River Rouge” of decision factories.  What may be unusual for nuclear organizations is the large percentage of decisions that directly or indirectly include safety dimensions, dimensions that can be uncertain and/or significantly judgmental, and which often conflict with other business goals.  So nuclear organizations have to deliver two products: competitively priced megawatts and decisions that preserve adequate safety.

To Kahneman decisions as product logically raises the issue of quality control as a means to ensure the quality of decisions.  At one level quality control might focus on mistakes and ensuring that decisions avoid recurrence of mistakes.  But Kahneman sees the quality function going further into the psychology of the decision process to ensure, e.g., that the best information is available to decision makers, that the talents of the group surrounding the ultimate decision maker are being used effectively, and the presence of an unbiased decision-making environment.

He notes that there is an enormous amount of resistance within organizations to improving decision processes. People naturally feel threatened if their decisions are questioned or second guessed.  So it may be very difficult or even impossible to improve the quality of decisions if the leadership is threatened too much.  But, are there ways to avoid this?  Kahneman suggests the “premortem” (think of it as the analog to a post mortem).  When a decision is being formulated (not yet made), convene a group meeting with the following premise: It is a year from now, we have implemented the decision under consideration, it has been a complete disaster.  Have each individual write down “what happened?”

The objective of the premortem is to legitimize dissent and minimize the innate “bias toward optimism” in decision analysis.  It is based on the observation that as organizations converge toward a decision, dissent becomes progressively more difficult and costly and people who warn or dissent can be viewed as disloyal.  The premortem essentially sets up a competitive situation to see who can come up with the flaw in the plan.  In essence everyone takes on the role of dissenter.  Kahneman’s belief is that the process will yield some new insights - that may not change the decision but will lead to adjustments to make the decision more robust. 

Kahneman’s ideas about decisions resonate with our thinking that the most useful focus for nuclear safety culture is the quality of organizational decisions.  It also contrasts with a recent instance of a nuclear plant run afoul of the NRC (Browns Ferry) and now tagged with a degraded cornerstone and increased inspections.  As usual in the nuclear industry, TVA has called on an outside contractor to come in and perform a safety culture survey, to “... find out if people feel empowered to raise safety concerns….”***  It may be interesting to see how people feel, but we believe it would be far more powerful and useful to analyze a significant sample of recent organizational decisions to determine if the decisions reflect an appropriate level of concern for safety.  Feelings (perceptions) are not a substitute for what is actually occurring in the decision process. 

We have been working to develop ways to grade whether decisions support strong safety culture, including offering opportunities on this blog for readers to “score” actual plant decisions.  In addition we have highlighted the work of Constance Perin including her book, Shouldering Risks, which reveals the value of dissecting decision mechanics.  Perin’s observations about group and individual status and credibility and their implications for dissent and information sharing directly parallel Kahneman’s focus on the need to legitimize dissent.  We hope some of this thinking ultimately overcomes the current bias in nuclear organizations to reflexively turn to surveys and the inevitable retraining in safety culture principles.

*  "Daniel Kahneman on behavioral economics," McKinsey Quarterly video interview (May 2008).

** M. Popova, "The Anti-Gladwell: Kahneman's New Way to Think About Thinking," The Atlantic website (Nov. 1, 2011).

*** A. Smith, "Nuke plant inspections proceeding as planned," Athens [Ala.] News Courier website (Nov. 2, 2011).

Friday, October 14, 2011

Decision No. 2 Scoring Results

In July we initiated a process for readers to participate in evaluating the extent to which actual decisions made at nuclear plants were consistent with a strong safety culture.  (The decision scoring framework is discussed here and the results for the first decision are discussed here.)  Example decision 2 involved a temporary repair to a Service Water System piping elbow.  Performance of a permanent code repair was postponed until the next cold shutdown or refuel outage.

We asked readers to assess the decision in two dimensions: potential safety impact and the strength of the decision, using anchored scales to quantify the scores.  The chart shows the scoring results.  Our interpretation of the results is as follows:

As with the first decision, most of the scores did coalesce in a limited range for each scoring dimension.  Based on the anchored scales, this meant most people thought the safety impact was fairly significant, likely due to the extended time period of the temporary repair which could extend to the next refuel outage.  The people that scored safety significance in this range also scored the decision strength as one that reasonably balanced safety and other operational priorities.  Our interpretation here is that people viewed the temporary repair as a reasonable interim measure, sufficient to maintain an adequate safety margin.  Notwithstanding that most scores were in the mid range, there were also decision strength scores as low as 3 (safety had lower priority than desired) and as high as 9 (safety had high priority where competing priorities were significant).  Across this range of decision strength scores, the scores for safety impact were consistent at 8.  This clearly illustrates the potential for varying perceptions of whether a decision is consistent with a strong safety culture.  The reasons for the variation could be based on how people felt about the efficacy of the temp repair or simply different standards or expectations for how aggressively one should address the leakage problem.

It is not very difficult to see how this scoring variability could translate into similarly mixed safety culture survey results.  But unlike survey questions which tend to be fairly general and abstract, the decision scoring results provide a definitive focus for assessing the “why” of safety culture perceptions.  Training and self assessment activities could benefit from these data as well.  Perhaps most intriguing is the question of what level of decision strength is expected in an organization with a “strong” safety culture.  Is it 5 (reasonably balances…) or is something higher, in the 6 to 7 range, expected?  We note that the average decision strength for example 2 was about 5.2.

Stay tuned for more on decision scoring.

Saturday, October 8, 2011

You Want Safety Culture? Then Pass a Law.

On October 7, 2011 California governor Brown signed SB 705 authored by state senator Mark Leno. 

The Leno bill, among many others, was inspired by a major gas pipeline explosion that occurred September 9, 2010 in San Bruno, CA resulting in multiple fatalities.  The ensuing investigations have identified a familiar litany of contributing causes: defective welds, ineffective maintenance practices, missing and incomplete records, and lax corporate management.

SB 705 adds Sections 961 and 963 to the Public Utilities Code.  Section 961 requires each gas corporation to “develop a plan for the safe and reliable operation of its commission-regulated gas pipeline facility. . . .”* (§ 961(b)(1))

Section 963 states “It is the policy of the state that the commission and each gas corporation place safety of the public and gas corporation employees as the top priority. [emphasis added]  The commission shall take all reasonable and appropriate actions necessary to carry out the safety priority policy of this paragraph consistent with the principle of just and reasonable cost-based rates.”* (§ 963(b)(3))

I was surprised that an unambiguous statement about safety’s importance was apparently missing from the state’s code.  I give senator Leno full credit for this vital contribution.

Of course, he couldn’t leave well enough alone and was quoted as saying “It’s not going to fix the situation overnight, but it changes the culture immediately.”** [emphasis added]

Now this comment is typical political braggadocio, and the culture will not change “immediately.”  However, this law will make safety more prominent on the corporate radar and eventually there should be responsive changes in policies, practices, procedures and behaviors.

*  Bill Text: CA Senate Bill 705 - 2011-2012 Regular Session

**  W. Buchanan, “Governor signs bill forcing automatic pipe valves,” S.F. Chronicle (Oct. 8, 2011). 

Monday, September 26, 2011

Beyond Training - Reinforcing Culture

One of our recurring themes has been how to strengthen safety culture, either to sustain an acceptable level of culture or to address weaknesses and improve it.  We have been skeptical of the most common initiative - retraining personnel on safety culture principles and values.  Simply put, we don’t believe you can PowerPoint or poster your way to culture improvement.

By comparison we were more favorably inclined to some of the approaches put forth in a recent New York Times interview of Andrew Thompson, a Silicon Valley entrepreneur.  As Thompson observes,

“’s the culture of what you talk about, what you celebrate, what you reward, what you make visible.  For example, in this company, which is very heavily driven by intellectual property, if you file a patent or have your name on a patent, we give you a little foam brain.”*

Foam “brains”.  How clever.  He goes on to describe other ideas such as employees being able to recognize each other for demonstrating desired values by awarding small gold coins (a nice touch here as the coins have monetary value that can be realized or retained as a visible trophy), and volunteer teams that work on aspects of culture.  The common denominator of much of this: management doesn’t do it, employees do.

*  A. Bryant, “Speak Frankly, but Don’t Go ‘Over the Net’,” New York Times (September 17, 2011).

Monday, September 12, 2011

Understanding the Risks in Managing Risks

Our recent blog posts have discussed the work of anthropologist Constance Perin.  This post looks at her book, Shouldering Risks: The Culture of Control in the Nuclear Power Industry.*  The book presents four lengthy case studies of incidents at three nuclear power plants and Perin’s analysis which aims to explain the cultural attributes that facilitated the incidents’ occurrence or their unfavorable evolution.

Because they fit nicely with our interest in decision-making, this post will focus on the two case studies that concerned hardware issues.**  The first case involved a leaking, unisolable valve in the reactor coolant system (RCS) that needed repacking, a routine job.  The mechanics put the valve on its backseat, opened it, observed the packing moving up (indicating that the water pressure was too high or the backseat step hadn't worked), and closed it up.  After management meetings to review the situation, the mechanics tried again, packing came out, and the leak became more serious.  The valve stem and disc had separated, a fact that was belatedly recognized.  The leak was eventually sufficiently controlled so the plant could wait until the next outage to repair/replace the valve.  

The second case involved a switchyard transformer that exhibited a hot spot during a thermography examination.  Managers initially thought they had a circulating current issue, a common problem.  After additional investigations, including people climbing on ladders up alongside the transformer, a cover bolt was removed and the employee saw a glow inside the transformer, the result of a major short.  Transformers can, and have, exploded from such thermal stresses but the plant was able to safely shut down to repair/replace the transformer.

In both cases, there was at least one individual who knew (or strongly suspected) that something more serious was wrong from the get-go but was unable to get the rest of the organization to accept a more serious, i.e., costly, diagnosis.

Why were the plant organizations so willing, even eager, to assume the more conventional explanations for the problems they were seeing?  Perin provides a multidimensional framework that helps answer that question.

The first dimension is the tradeoff quandary, the ubiquitous tension between production and cost, including costs associated with safety.  Plant organizations are expected to be making electricity, at a budgeted cost, and that subtle (or not-so-subtle) pressure colors the discussion of any problem.  There is usually a preference for a problem explanation and corrective action that allows the plant to continue running.

Three control logics constitute a second dimension.  The calculated logics are the theory of how a plant is (or should be) designed, built, and operated.  The real-time logics consist of the knowledge of how things actually work in practice.  Policy logics come from above, and represent generalized guidelines or rules for behavior, including decision-making.  An “answer” that comes from calculated or policy logic will be preferred over one that comes from real-time logic, partly because the former have been developed by higher-status groups and partly because such answers are more defensible to corporate bosses and regulators.

Finally, traditional notions of group and individual status and a key status property, credibility, populate a third dimension: design engineers over operators over system engineers over maintenance over others; managers over individual contributors; old-timers over newcomers.  Perin creates a construct of the various "orders"*** in a plant organization, specialists such as operators or system engineers.  Each order has its own worldview, values and logics – optimum conditions for nurturing organizational silos.  Information and work flows are mediated among different orders via plant-wide programs (themselves products of calculated and policy logics).
Application to Cases

The aforementioned considerations can be applied to the two cases.  Because the valve was part of the RCS, it should have been subject to more detailed planning, including additional risk analysis and contingency prep.  This was pointed out by a new-to-his-job work planner who was basically ignored because of his newcomer status.  And before the work was started, the system engineer (SE) observed that this type of valve (which had a problem history at this plant and elsewhere) was prone to valve disk/stem separation and this particular valve appeared to have the problem based on his visual inspection (it had one thread less visible than other similar valves).  But the SE did not make his observations forcefully and/or officially (by initiating a CR) so his (accurate) observation was not factored into the early decision-making.  Ultimately, their concerns did not sway the overall discussion where the schedule was highest priority.  A radiographic examination that would have shown the valve/disc separation was not performed early on because that was an Engineering responsibility and the valve repair was a Maintenance project.

The transformer is on the non-nuclear side of the plant, which makes the attitudes toward it less focused and critical than for safety-related equipment.  The hot spot was discovered by a tech who was working with a couple of thermography consultants.  Thermography was a relatively new technology at this plant and not well-understood by plant managers (or trusted because early applications had given false alarms).  The tech said that the patterns he observed were not typical for circulating currents but neither he nor the consultants (the three people on-site who understood thermography) were in the meetings where the problem was discussed.  The circulating current theory was popular because (a) the plant had experienced such problems in the past and (b) addressing it could be done without shutting down the plant.  Production pressure, the nature of past problems, and the lower status of roles and equipment that are not safety related all acted to suppress the emergent new knowledge of what the problem actually was.  

Lessons Learned

Perin’s analytic constructs are complicated and not light reading.  However, the interviews in the case studies are easy to read and very revealing.  It will come as no surprise to people with consulting backgrounds that the interviewees were capable of significant introspection.  In the harsh light of hindsight, lots of folks can see what should (and could) have happened.  

The big question is what did those organizations learn?  Will they make the same mistakes again?  Probably not.  But will they misinterpret future weak or ambiguous signals of a different nascent problem?  That’s still likely.  “Conventional wisdom” codified in various logics and orders and guided by a production imperative remains a strong force working against the open discussion of alternative explanations for new experiences, especially when problem information is incomplete or fuzzy.  As Bob Cudlin noted in his August 17, 2011 post: [When dealing with risk-imbued issues] “the intrinsic uncertainties in significance determination opens the door to the influence of other factors - namely those ever present considerations of cost, schedule, plant availability, and even more personal interests, such as incentive programs and career advancement.”

*  C. Perin, Shouldering Risks: The Culture of Control in the Nuclear Power Industry, (Princeton, NJ: Princeton University Press, 2005).

**  The case studies and Perin’s analysis have been greatly summarized for this blog post.

***  The “orders” include outsiders such as NRC, INPO or corporate overseers.  Although this may not be totally accurate, I picture orders as akin to medieval guilds.

Wednesday, August 17, 2011

Additional Thoughts on Significance Culture

Our previous post introduced the work of Constance Perin,  Visiting Scholar in Anthropology at MIT, including her thesis of “significance culture” in nuclear installations.  Here we expand on the intersection of her thesis with some of our work. 

Perin places primary emphasis on the availability and integration of information to systematize and enhance the determination of risk significance.  This becomes the true organizing principle of nuclear operational safety and supplants the often hazy construct of safety culture.  We agree with the emphasis on more rigorous and informed assessments of risk as an organizing principle and focus for the entire organization. 

Perin observes: “Significance culture arises out of a knowledge-using and knowledge-creating paradigm. Its effectiveness depends less on “management emphasis” and “personnel attitudes” than on having an operational philosophy represented in goals, policies, priorities, and actions organized around effectively characterizing questionable conditions before they can escalate risk.” (Significance Culture, p. 3)*

We found a similar thought from Kenneth Brawn on a recent LinkedIn post under the Nuclear Safety Group.  He states, “Decision making, and hence leadership, is based on accurate data collection that is orchestrated, focused, real time and presented in a structured fashion for a defined audience….Managers make decisions based on stakeholder needs – the problem is that risk is not adequately considered because not enough time is taken (given) to gather and orchestrate the necessary data to provide structured information for the real time circumstances.” ** 

While seeing the potential unifying force of significance culture, we are mindful also that such determinations often are made under a cloak of precision that is not warranted or routinely achievable.  Such analyses are complex, uncertain, and subject to considerable judgment by the involved analysts and decision makers.  In other words, they are inherently fuzzy.  This limitation can only be partly remedied through better availability of information.  Nuclear safety does not generally include “bright lines” of acceptable or unacceptable risks, or finely drawn increments of risk.  Sure, PRA analyses and other “risk informed” approaches provide the illusion of quantitative precision, and often provide useful insight for devising courses of action that that do not pose “undue risk” to public safety.  But one does not have to read too many Licensee Event Reports (LERs) to see that risk determinations are ultimately shades of gray.  For one example, see the background information on our decision scoring example involving a pipe leak in a 30” moderate energy piping elbow and interim repair.  The technical justification for the interim fix included terms such as “postulated”, “best estimate” and “based on the assumption”.  A full reading of the LER makes clear the risk determination involved considerable qualitative judgment by the licensee in making its case and the NRC in approving the interim measure. That said, the NRC’s justification also rested in large part on a finding of “hardship or unusual difficulty” if a code repair were to be required immediately.

Where is this leading us?  Are poor safety decisions the result of the lack of quality information?  Perhaps.  However another scenario that is at least equally likely, is that the appropriate risk information may not be pursued vigorously or the information may be interpreted in the light most favorable to the organization’s other priorities.  We believe that the intrinsic uncertainties in significance determination opens the door to the influence of other factors - namely those ever present considerations of cost, schedule, plant availability, and even more personal interests, such as incentive programs and career advancement.  Where significance is fuzzy, it invites rationalization in the determination of risk and marginalization of the intrinsic uncertainties.  Thus a desired decision outcome could encourage tailoring of the risk determination to achieve the appropriate fit.  It may mean that Perin’s focus on “effectively characterizing questionable conditions” must also account for the presence and potential influence of other non-safety factors as part of the knowledge paradigm.   

This brings us back to Perin’s ideas for how to pull the string and dig deeper into this subject.  She finds, “Condition reports and event reviews document not only material issues. Uniquely, they also document systemic interactions among people, priorities, and equipment — feedback not otherwise available.” (Significance Culture, p.5)  This emphasis makes a lot of sense and in her book, Shouldering Risks: The Culture of Control in the Nuclear Power Industry, she takes up the challenge of delving into the depths of a series of actual condition reports.  Stay tuned for our review of the book in a subsequent post.

*  C. Perin, “Significance Culture in Nuclear Installations,” a paper presented at the 2005 Annual Meeting of the American Nuclear Society (June 6, 2005).

**  You may be asked to join the LinkedIn Nuclear Safety group to view Mr. Brawn's comment and the discussion of which it is part.

Friday, August 12, 2011

An Anthropologist’s View

Academics in many disciplines study safety culture.  This post introduces to this blog the work of an MIT anthropologist, Constance Perin, and discusses a paper* she presented at the 2005 ANS annual meeting.

We picked a couple of the paper’s key recommendations to share with you.  First, Perin’s main point is to advocate the development of a “significance culture” in nuclear power plant organizations.  The idea is to organize knowledge and data in a manner that allows an organization to determine significance with respect to safety issues.  The objective is to increase an organization’s capabilities to recognize and evaluate questionable conditions before they can escalate risk.  We generally agree with this aim.  The real nub of safety culture effectiveness is how it shapes the way an organization responds to new or changing situations.

Perin understands that significance evaluation already occurs in both formal processes (e.g., NRC evaluations and PRAs) and in the more informal world of operational decisions, where trade-offs, negotiations, and satisficing behavior may be more dynamic and less likely to be completely rational.  She recommends that significance evaluation be ascribed a higher importance, i.e., be more formally and widely ingrained in the overall plant culture, and used as an organizing principle for defining knowledge-creating processes. 

Second, because of the importance of a plant's Corrective Action Program (CAP), Perin proposes making NRC assessment of the CAP the “eighth cornerstone” of the Reactor Oversight Process (ROP).  She criticizes the NRC’s categorization of cross cutting issues for not being subjected to specific criteria and performance indicators.  We have a somewhat different view.  Perin’s analysis does not acknowledge that the industry places great emphasis on each of the cross cutting issues in terms of performance indicators and monitoring including self assessment.**  It is also common to the other cornerstones where the plants use many more indicators to track and trend performance than the few included in the ROP.  In our opinion, a real problem with the ROP is that its few indicators do not provide any reliable or forward looking picture of nuclear safety. 

The fault line in the CAP itself may better be characterized in terms of the lack of measurement and assessment of how well the CAP program functions to sustain a strong safety culture.  Importantly such an approach would evaluate how decisions on conditions adverse to quality properly assessed not only significance, but balanced the influence of any competing priorities.  Perin also recognizes that competing priorities exist, especially in the operational world, but making the CAP a cornerstone might actually lead to increased false confidence in the CAP if its relationship with safety culture was left unexamined.

Prof. Perin has also written a book, Shouldering Risks: The Culture of Control in the Nuclear Power Industry,*** which is an ethnographic analysis of nuclear organizations and specific events they experienced.  We will be reviewing this book in a future post.  We hope that her detailed drill down on those events will yield some interesting insights, e.g., how different parts of an organization looked at the same situation but had differing evaluations of its risk implications.

We have to admit we didn’t detect Prof. Perin on our radar screen; she alerted us to the presence of her work.  Based on our limited review to date, we think we share similar perspectives on the challenges involved in attaining and maintaining a robust safety culture.

*  C. Perin, “Significance Culture in Nuclear Installations,” a paper presented at the 2005 Annual Meeting of the American Nuclear Society (June 6, 2005).

** The issue may be one of timing.  Prof. Perin based her CAP recommendation, in part, on a 2001 study that suggested licensees’ self-regulation might be inadequate.  We have the benefit of a more contemporary view.  

*** C. Perin, Shouldering Risks: The Culture of Control in the Nuclear Power Industry, (Princeton, NJ: Princeton University Press, 2005).

Friday, July 15, 2011

Decision Scoring No. 2

This post introduces the second decision scoring example.  Click here, or the box above this post, to access the detailed decision summary and scoring feature.  

This example involves a proposed non-code repair to a leak in the elbow of service water system piping.  By opting for a non-code, temporary repair, a near term plant shutdown will be avoided but the permanent repair will be deferred for as long as 20 months.  In grading this decision for safety impact and decision strength, it may be helpful to think about what alternatives were available to this licensee.  We could think of several:

-    not perform a temporary repair as current leakage was within tech spec limits, but implement an augmented inspection and monitoring program to timely identify any further degradation.

-    perform the temporary repair as described but commit to perform the permanent repair within a shorter time period, say 6 months.

-    immediately shut down and perform the code repair.

Each of these alternatives would likely affect the potential safety impact of this leak condition and influence the perception of the decision strength.  For example a decision to shut down immediately and perform the code repair would likely be viewed as quite conservative, certainly more conservative than the other options.  Such a decision might provide the strongest reinforcement of safety culture.  The point is that none of these decisions is necessarily right or wrong, or good or bad.  They do however reflect more or less conservatism, and ultimately say something about safety culture.

Wednesday, July 13, 2011

Decision No. 1 Scoring Results

We wanted to present the results to date for the first of the decision scoring examples.  (The decision scoring framework is discussed here.)  This decision involved the replacement of a bearing in the air handling unit for a safety related pump room.  After declaring the air unit inoperable, the bearing was replaced within the LCO time window.

We asked readers to assess the decision in two dimensions: potential safety impact and the strength of the decision, using anchored scales to quantify the scores.  The chart to the left shows the scoring results with the size of the data symbols related to the number of responses.  Our interpretation of the results is as follows:

First, most of the scores did coalesce in the mid ranges of each scoring dimension.  Based on the anchored scales, this meant most people thought the safety impact associated with the air handling unit problem was fairly minimal and did not extend out in time.  This is consistent with the fact that the air handler bearing was replaced within the LCO time window.  The people that scored safety significance in this mid range also scored the decision strength as one that reasonably balanced safety and other operational priorities.  This seems consistent to us with the fact that the licensee had also ordered a new shaft for the air handler and would install it at the next outage - the new shaft being necessary for addressing the cause of the bearing problem.  Notwithstanding that most scores were in the mid range, we find it interesting that there is still a spread from 4-7 in the scoring of decision strength, and somewhat smaller spread of 4-6 in safety impact.  This would be an attribute of decision scores that might be tracked closely to see identify situations where the spreads change over time - perhaps signaling that either there is disagreement regarding the merits of the decisions or that there is a need for better communication of the bases for decisions.

Second, while not a definitive trend, it is apparent that in the mid-range scores people tended to see decision strength in terms of safety impact.  In other words, in situations where the safety impact was viewed as greater (e.g., 6 or so), the perceived strength of the decision was viewed as somewhat less than when the safety impact was viewed as somewhat lower (e.g., 4 or so).  This trend was emphasized by the scores that rated decision strength at 9 based on safety impact of 2.  There is intrinsic logic to this and also may highlight to managers that an organization’s perception of safety priorities will be directly influenced by their understanding of the safety significance of the issues involved.  One can also see the potential for decision scores “explaining” safety culture survey results which often indicate a relatively high percentage of respondents “somewhat agreeing” that e.g., safety is a high priority, a smaller percentage “mostly agreeing” and a smaller percentage yet, “strongly agreeing”. 

Third, there were some scores that appeared to us to be “outside the ballpark”.  These were the scores that rated safety impact at 10 did not seem consistent with our reading of the air handling unit issue, including the note indicating that the licensee had assessed the safety significance as minimal.

Stay tuned for the next decision scoring example and please provide your input.

Friday, June 24, 2011

Rigged Decisions?

The Wall Street Journal reported on June 23, 2011* on an internal investigation conducted by Transocean, owner of the Deepwater Horizon drill rig, that placed much of the blame for the disaster on a series of decisions made by BP.  Is this news?  No, the blame game has been in full swing almost since the time of the rig explosion.  But we did note that Transocean’s conclusion was based on a razor sharp focus on:

“...a succession of interrelated well design, construction, and temporary abandonment decisions that compromised the integrity of the well and compounded the risk of its failure…”**  (p. 10)

Note, their report did not place the focus on the “attitudes, beliefs or values” of BP personnel or rig workers, and really did not let their conclusions drift into the fuzzy answer space of “safety culture”.  In fact the only mention of safety culture in their 200+ page report is in reference to a U.S. Coast Guard (USCG) inspection of the drill rig in 2009 which found:

“outstanding safety culture, performance during drills and condition of the rig.” (p. 201)

There is no mention of how the USCG reached such a conclusion and the report does not rely on it to support its conclusions.  It would not be the first time that a favorable safety culture assessment at a high risk enterprise preceded a major disaster.***

We also found the following thread in the findings that reinforce the importance of recognizing and understanding the impact of underlying constraints on decisions:

“The decisions, many made by the operator, BP, in the two weeks leading up to the incident, were driven by BP’s knowledge that the geological window for safe drilling was becoming increasingly narrow.” (p.10)

The fact is, decisions get squeezed all the time resulting in decisions which may be reducing margins but arguably are still “acceptable”.  But such decisions do not necessarily lead to unsafe, much less disastrous, results.  Most of the time the system is not challenged, nothing bad happens, and you could even say the marginal decisions are reinforced.  Are these tradeoffs to accommodate conflicting priorities the result of a weakened safety culture?  Perhaps.  But we suspect that the individuals making the decisions would say they believed safety was their priority and culture may have appeared normal to outsiders as well (e.g., the USCG).  The paradox occurs because decisions can trend in a weaker direction before other, more distinct evidence of degrading culture become apparent.  In this case, a very big explosion.

*  B. Casselman and A. Gonzalez, "Transocean Puts Blame on BP for Gulf Oil Spill," (June 23, 2011).

** "Macondo Well Incident: Transocean Investigation Report," Vol I, Transocean, Ltd. (June 2011).

*** For example, see our August 2, 2010 post.

Tuesday, June 21, 2011


Safety Culture Performance Measures

Developing forward looking performance measures for safety culture remains a key challenge today and is the logical next step following the promulgation of the NRC’s policy statement on safety culture.  The need remains high as safety culture issues continue to be identified by the NRC subsequent to weaknesses developing in the safety culture and ultimately manifesting in traditional (lagging) performance indicators.

Current practice has continued to rely on safety culture surveys which focus almost entirely on attitudes and perceptions about safety.  But other cultural values are also present in nuclear operations - such as meeting production goals - and it is the rationalization of competing values on a daily basis that is at the heart of safety culture.  In essence decision makers are pulled in several directions by these competing priorities and must reach answers that accord safety its appropriate priority.

Our focus is on safety management decisions made every day at nuclear plants; e.g., operability, exceeding LCO limits, LER determinations, JCOs, as well as many determinations associated with problem reporting, and corrective action.  We are developing methods to “score” decisions based on how well they balance competing priorities and to relate those scores to inference of safety culture.  As part of that process we are asking our readers to participate in the scoring of decisions that we will post each week - and then share the results and interpretation.  The scoring method will be a more limited version of our developmental effort but should illustrate some of the benefits of a decision-centric view of safety culture.

Look in the right column for the links to Score Decisions.  They will take you to the decision summaries and score cards.  We look forward to your participation and welcome any questions or comments.

Wednesday, June 15, 2011

DNFSB Goes Critical

Hanford WTP
The Defense Nuclear Facilities Safety Board (DNFSB)issued a “strongly worded” report* this week on safety culture at the Hanford Waste Treatment and Immobilization Plant (WTP).  The DNFSB determined that the safety culture at the WTP is “flawed” and “that both DOE and contractor project management behaviors reinforce a subculture at WTP that deters the timely reporting, acknowledgement, and ultimate resolution of technical safety concerns.”

For example, the Board found that “expressions of technical dissent affecting safety at WTP, especially those affecting schedule or budget, were discouraged, if not opposed or rejected without review” and heard testimony from several witnesses that “raising safety issues that can add to project cost or delay schedule will hurt one's career and reduce one's participation on project teams.”

Only several months ago we blogged about initiatives by DOE regarding safety culture at its facilities.  In our critique we observed, “Goal conflict, often expressed as safety vs mission, should obviously be avoided but its insidiousness is not adequately recognized [in the DOE initiatives]."  Seems like the DNFSB put their finger on this at WTP.  In fact the DNFSB report states:

“The HSS [DOE's Office of Health, Safety and Security] review of the safety culture on the WTP project 'indicates that BNI [Bechtel National Inc.] has established and implemented generally effective, formal processes for identifying, documenting, and resolving nuclear safety, quality, and technical concerns and issues raised by employees and for managing complex technical issues.'  However, the Board finds that these processes are infrequently used, not universally trusted by the WTP project staff, vulnerable to pressures caused by budget or schedule [emphasis added], and are therefore not effective.” 

The Board was not done with goal conflict. It went on to cite the experience of a DOE expert witness:

“The testimony of several witnesses confirms that the expert witness was verbally admonished by the highest level of DOE line management at DOE's debriefing meeting following this session of the hearing.  Although testimony varies on the exact details of the verbal interchange, it is clear that strong hostility was expressed toward the expert witness whose testimony strayed from DOE management's policy while that individual was attempting to adhere to accepted professional standards.”

This type of intimidation need not be, and generally is not, so explicit. The same message can be sent through many subtle and insidious channels which are equally effective.  It is goal conflict of another stripe - we refer to it as “organizational stress” - where the organizational interests of individuals - promotions, performance appraisals, work assignments, performance incentives, etc. - create another dimension of tension in achieving safety priority.  It is just as real and a lot more personal than the larger goal conflicts of cost and schedule pressures.

*  Defense Nuclear Facilities Safety Board, Recommendation 2011-1 to the Secretary of Energy "Safety Culture at the Waste Treatment and Immobilization Plant" (Jun 9, 2011).

Thursday, May 26, 2011

Upper Big Branch 1

A few days ago the Governor’s Independent Investigation Panel issued its report on the Upper Big Branch coal mine explosion of April 5, 2010.  The report is over 100 pages and contains considerable detail on the events and circumstances leading up to the disaster, coal mining technology and safety issues.  It is well worth reading for anyone in the business of assuring safety in a complex and high risk enterprise.  We anticipate doing several blog posts on material from the report but wanted to start with a brief quote from the forward to the report, summarizing its main conclusions.

“A genuine commitment to safety means not just examining miners’ work practices and behaviors.  It means evaluating management decisions up the chain of command - all the way to the boardroom - about how miners’ work is organized and performed.”*

We believe this conclusion is very much on the mark for safety management and for the safety culture that supports it in a well managed organization.  It highlights what to us has appeared to be an over-emphasis in the nuclear industry on worker practices and behaviors - and “values”.   And it focuses attention on management decisions - decisions that maintain an appropriate weight to safety in a world of competing priorities and interests - as the sine qua non of safety.  As we have discussed in many of our posts, we are concerned with the emphasis by the nuclear industry on safety culture surveys and training in safety culture principles and values as the primary tools of assuring a strong safety culture.  Rarely do culture assessments focus on the decisions that underlie the management of safety to examine the context and influence of factors such as impacts on operations, availability of resources, personnel incentives and advancement, corporate initiatives and goals, and outside factors such as political pressure.  The Upper Big Branch report delves into these issues and builds a compelling basis for the above conclusion, a conclusion that is not limited to the coal industry.

*  Governor’s Independent Investigation Panel, “Report to the Governor: Upper Big Branch,” National Technology Transfer Center, Wheeling Jesuit University (May 2011), p. 4.

Thursday, May 19, 2011

Mental Models and Learning

A recent New York Times article on teaching methods* caught our eye.  It reported an experiment by college physics professors to improve their freshmen students’ understanding and retention of introductory material.  The students comprised two large (260+) classes that usually were taught via lectures.  For one week, teaching assistants used a collaborative, team-oriented approach for one of the classes.  Afterward, this group scored higher on the test than the group that received the traditional lecture.  

One of the instructors reported, “. . . this class actively engages students and allows them time to synthesize new information and incorporate it into a mental model . . . . When they can incorporate things into a mental model, we find much better retention.”

We are big believers in mental models, those representations of the world that people create in their minds to make sense of information and experience.  They are a key component of our system dynamics approach to understanding and modeling safety culture.  Our NuclearSafetySim model illustrates how safety culture interacts with other variables in organizational decision-making; a primary purpose for this computer model is to create a realistic mental model in users’ minds.

Because this experiment helped the students form more useful mental models, our reaction to it is generally favorable.  On the other hand, why is the researchers’ “insight” even news?  Why wouldn’t a more engaging approach lead to a better understanding of any subject?  Don’t most of you develop a better understanding when you do the lab work, code your own programs, write the reports you sign, or practice decision-making in a simulated environment?

*  B. Carey, “Less Talk, More Action: Improving Science Learning,” New York Times (May 12, 2011).

Tuesday, May 10, 2011

Shifting the Burden

Pitot tube
This post emanates from the ongoing investigations of the crash of Air France flight 447 from Rio de Janeiro to Paris.  In some respects it is a follow-up to our January 27, 2011 post on Air France’s safety culture.  An article in the New York Times Sunday Magazine* explores some of the mysteries surrounding the loss of the plane in mid-Atlantic.  One of the possible theories for the crash involves the pitot tubes used on the Airbus plane.  Pitot tubes are instruments used on aircraft to measure air speed.  The pitot tube measures the difference between total (stagnation) and static pressure to determine dynamic pressure and therefore velocity of the air stream.  Care must be taken to assure that the pitot tubes do not become clogged with ice or other foreign matter as it would interrupt or corrupt the airspeed signal provided to the pilots and the auto-pilot system. 

On the flight 447 aircraft, three Thales AA model pitot tubes were in use.  They are produced by a French company and cost approximately $3500 each.  The Times article goes on to explain:

" the summer of 2009, the problem of icing on the Thales AA was known to be especially common….Between 2003 and 2008, there were at least 17 cases in which the Thales AA had problems on the Airbus A330 and its sister plane, the A340.  In September 2007, Airbus issued a ‘service bulletin’ suggesting that airlines replace the AA pitots with a newer model, the BA, which was said to work better in ice.”

Air France’s response to the service bulletin established a policy to replace the AA tubes “only when a failure occurred”.  A year later Air France then asked Airbus for “proof” that the model BA tubes worked better in ice.  It took Airbus another 6-7 months to perform tests that demonstrated the superior performance of the BA tubes, following which Air France proceeded with implementing the recommended change for its A330 aircraft.  Unfortunately the new probes had not yet been installed at the time of flight 447.

Much is still unknown about whether in fact the pitot tubes played a role in the crash of flight 447 and of the details of Air France’s consideration of deploying replacements.  But there is a sufficient framework to pose some interesting questions regarding how safety considerations were balanced in the process, and what might be inferred about the Air France safety culture.  Most clearly it highlights how fundamental the decision making process is to safety culture.

What is clear is that Air France’s approach to this problem “shifted the burden” from assuring that something was safe to proving that it was unsafe.  In legal usage this involves transferring the obligation to prove a fact in controversy from one party to another.  Or in systems thinking (which you may have noticed we strongly espouse) it denotes a classic dynamic archetype - a problem arises, it can be ameliorated through either a short term, symptom based response or a fundamental solution that may take additional time and/or resources to implement.  Choosing the short term fix provides relief and reinforces the belief in the efficacy of the response.  Meanwhile the underlying problem goes unaddressed.  For Air France, the service bulletin created a problem.  Air France could have immediately replaced the pitot tubes or undertaken its own assessment of pitot tubes with replacement to follow.  This would have taken time and resources.  Nor did Air France appear to try to address the threshold question of whether the existing AA model instruments were adequate - in nuclear industry terms, were they “operable” and able to perform their safety function?  Air France apparently did not even implement interim measures such as retraining to improve pilot’s recognition and response to pitot tube failures or incorrect readings.  Instead, Air France shifted the burden back to Airbus to “prove” their recommendation.  The difference between showing that something is not safe versus that it is safe is as wide as, well, the Atlantic Ocean.

What we find particularly interesting about shifting the burden is that it is just another side of the complacency coin.  Most people engaged in safety culture science recognize that complacency is a potential contributor to the decay and loss of effectiveness of safety culture.  Everything appears to be going OK so there is less need to pursue issues, particularly those lacking safety impact clarity.  Not pursuing root causes, not verifying corrective action efficacy, loss of questioning attitude and lack of resources could all be telltale signs of complacency.  The interesting thing about shifting the burden is that it yields much the same result - but with the appearance that action is being taken. 

The footnote to the story is the response of Air Caraibes to similar circumstances in this time frame.  The Times article indicates Air Caraibes experienced two “near misses” with Thales AA pitot tubes on A330 aircraft.  They immediately replaced the parts and notified regulators.

*  W.S. Hylton, "What Happened to Air France Flight 447?" New York Times Magazine (May 4, 2011).