Friday, July 15, 2011

Decision Scoring No. 2

This post introduces the second decision scoring example.  Click here, or the box above this post, to access the detailed decision summary and scoring feature.  

This example involves a proposed non-code repair to a leak in the elbow of service water system piping.  By opting for a non-code, temporary repair, a near term plant shutdown will be avoided but the permanent repair will be deferred for as long as 20 months.  In grading this decision for safety impact and decision strength, it may be helpful to think about what alternatives were available to this licensee.  We could think of several:

-    not perform a temporary repair as current leakage was within tech spec limits, but implement an augmented inspection and monitoring program to timely identify any further degradation.

-    perform the temporary repair as described but commit to perform the permanent repair within a shorter time period, say 6 months.

-    immediately shut down and perform the code repair.

Each of these alternatives would likely affect the potential safety impact of this leak condition and influence the perception of the decision strength.  For example a decision to shut down immediately and perform the code repair would likely be viewed as quite conservative, certainly more conservative than the other options.  Such a decision might provide the strongest reinforcement of safety culture.  The point is that none of these decisions is necessarily right or wrong, or good or bad.  They do however reflect more or less conservatism, and ultimately say something about safety culture.

Wednesday, July 13, 2011

Decision No. 1 Scoring Results


We wanted to present the results to date for the first of the decision scoring examples.  (The decision scoring framework is discussed here.)  This decision involved the replacement of a bearing in the air handling unit for a safety related pump room.  After declaring the air unit inoperable, the bearing was replaced within the LCO time window.

We asked readers to assess the decision in two dimensions: potential safety impact and the strength of the decision, using anchored scales to quantify the scores.  The chart to the left shows the scoring results with the size of the data symbols related to the number of responses.  Our interpretation of the results is as follows:

First, most of the scores did coalesce in the mid ranges of each scoring dimension.  Based on the anchored scales, this meant most people thought the safety impact associated with the air handling unit problem was fairly minimal and did not extend out in time.  This is consistent with the fact that the air handler bearing was replaced within the LCO time window.  The people that scored safety significance in this mid range also scored the decision strength as one that reasonably balanced safety and other operational priorities.  This seems consistent to us with the fact that the licensee had also ordered a new shaft for the air handler and would install it at the next outage - the new shaft being necessary for addressing the cause of the bearing problem.  Notwithstanding that most scores were in the mid range, we find it interesting that there is still a spread from 4-7 in the scoring of decision strength, and somewhat smaller spread of 4-6 in safety impact.  This would be an attribute of decision scores that might be tracked closely to see identify situations where the spreads change over time - perhaps signaling that either there is disagreement regarding the merits of the decisions or that there is a need for better communication of the bases for decisions.

Second, while not a definitive trend, it is apparent that in the mid-range scores people tended to see decision strength in terms of safety impact.  In other words, in situations where the safety impact was viewed as greater (e.g., 6 or so), the perceived strength of the decision was viewed as somewhat less than when the safety impact was viewed as somewhat lower (e.g., 4 or so).  This trend was emphasized by the scores that rated decision strength at 9 based on safety impact of 2.  There is intrinsic logic to this and also may highlight to managers that an organization’s perception of safety priorities will be directly influenced by their understanding of the safety significance of the issues involved.  One can also see the potential for decision scores “explaining” safety culture survey results which often indicate a relatively high percentage of respondents “somewhat agreeing” that e.g., safety is a high priority, a smaller percentage “mostly agreeing” and a smaller percentage yet, “strongly agreeing”. 

Third, there were some scores that appeared to us to be “outside the ballpark”.  These were the scores that rated safety impact at 10 did not seem consistent with our reading of the air handling unit issue, including the note indicating that the licensee had assessed the safety significance as minimal.

Stay tuned for the next decision scoring example and please provide your input.

Friday, June 24, 2011

Rigged Decisions?

The Wall Street Journal reported on June 23, 2011* on an internal investigation conducted by Transocean, owner of the Deepwater Horizon drill rig, that placed much of the blame for the disaster on a series of decisions made by BP.  Is this news?  No, the blame game has been in full swing almost since the time of the rig explosion.  But we did note that Transocean’s conclusion was based on a razor sharp focus on:

“...a succession of interrelated well design, construction, and temporary abandonment decisions that compromised the integrity of the well and compounded the risk of its failure…”**  (p. 10)


Note, their report did not place the focus on the “attitudes, beliefs or values” of BP personnel or rig workers, and really did not let their conclusions drift into the fuzzy answer space of “safety culture”.  In fact the only mention of safety culture in their 200+ page report is in reference to a U.S. Coast Guard (USCG) inspection of the drill rig in 2009 which found:

“outstanding safety culture, performance during drills and condition of the rig.” (p. 201)

There is no mention of how the USCG reached such a conclusion and the report does not rely on it to support its conclusions.  It would not be the first time that a favorable safety culture assessment at a high risk enterprise preceded a major disaster.***

We also found the following thread in the findings that reinforce the importance of recognizing and understanding the impact of underlying constraints on decisions:

“The decisions, many made by the operator, BP, in the two weeks leading up to the incident, were driven by BP’s knowledge that the geological window for safe drilling was becoming increasingly narrow.” (p.10)

The fact is, decisions get squeezed all the time resulting in decisions which may be reducing margins but arguably are still “acceptable”.  But such decisions do not necessarily lead to unsafe, much less disastrous, results.  Most of the time the system is not challenged, nothing bad happens, and you could even say the marginal decisions are reinforced.  Are these tradeoffs to accommodate conflicting priorities the result of a weakened safety culture?  Perhaps.  But we suspect that the individuals making the decisions would say they believed safety was their priority and culture may have appeared normal to outsiders as well (e.g., the USCG).  The paradox occurs because decisions can trend in a weaker direction before other, more distinct evidence of degrading culture become apparent.  In this case, a very big explosion.

*  B. Casselman and A. Gonzalez, "Transocean Puts Blame on BP for Gulf Oil Spill," wsj.com (June 23, 2011).

** "Macondo Well Incident: Transocean Investigation Report," Vol I, Transocean, Ltd. (June 2011).

*** For example, see our August 2, 2010 post.

Tuesday, June 21, 2011

Decisions….Decisions

Safety Culture Performance Measures

Developing forward looking performance measures for safety culture remains a key challenge today and is the logical next step following the promulgation of the NRC’s policy statement on safety culture.  The need remains high as safety culture issues continue to be identified by the NRC subsequent to weaknesses developing in the safety culture and ultimately manifesting in traditional (lagging) performance indicators.

Current practice has continued to rely on safety culture surveys which focus almost entirely on attitudes and perceptions about safety.  But other cultural values are also present in nuclear operations - such as meeting production goals - and it is the rationalization of competing values on a daily basis that is at the heart of safety culture.  In essence decision makers are pulled in several directions by these competing priorities and must reach answers that accord safety its appropriate priority.

Our focus is on safety management decisions made every day at nuclear plants; e.g., operability, exceeding LCO limits, LER determinations, JCOs, as well as many determinations associated with problem reporting, and corrective action.  We are developing methods to “score” decisions based on how well they balance competing priorities and to relate those scores to inference of safety culture.  As part of that process we are asking our readers to participate in the scoring of decisions that we will post each week - and then share the results and interpretation.  The scoring method will be a more limited version of our developmental effort but should illustrate some of the benefits of a decision-centric view of safety culture.

Look in the right column for the links to Score Decisions.  They will take you to the decision summaries and score cards.  We look forward to your participation and welcome any questions or comments.

Wednesday, June 15, 2011

DNFSB Goes Critical

Hanford WTP
The Defense Nuclear Facilities Safety Board (DNFSB)issued a “strongly worded” report* this week on safety culture at the Hanford Waste Treatment and Immobilization Plant (WTP).  The DNFSB determined that the safety culture at the WTP is “flawed” and “that both DOE and contractor project management behaviors reinforce a subculture at WTP that deters the timely reporting, acknowledgement, and ultimate resolution of technical safety concerns.”

For example, the Board found that “expressions of technical dissent affecting safety at WTP, especially those affecting schedule or budget, were discouraged, if not opposed or rejected without review” and heard testimony from several witnesses that “raising safety issues that can add to project cost or delay schedule will hurt one's career and reduce one's participation on project teams.”

Only several months ago we blogged about initiatives by DOE regarding safety culture at its facilities.  In our critique we observed, “Goal conflict, often expressed as safety vs mission, should obviously be avoided but its insidiousness is not adequately recognized [in the DOE initiatives]."  Seems like the DNFSB put their finger on this at WTP.  In fact the DNFSB report states:

“The HSS [DOE's Office of Health, Safety and Security] review of the safety culture on the WTP project 'indicates that BNI [Bechtel National Inc.] has established and implemented generally effective, formal processes for identifying, documenting, and resolving nuclear safety, quality, and technical concerns and issues raised by employees and for managing complex technical issues.'  However, the Board finds that these processes are infrequently used, not universally trusted by the WTP project staff, vulnerable to pressures caused by budget or schedule [emphasis added], and are therefore not effective.” 

The Board was not done with goal conflict. It went on to cite the experience of a DOE expert witness:

“The testimony of several witnesses confirms that the expert witness was verbally admonished by the highest level of DOE line management at DOE's debriefing meeting following this session of the hearing.  Although testimony varies on the exact details of the verbal interchange, it is clear that strong hostility was expressed toward the expert witness whose testimony strayed from DOE management's policy while that individual was attempting to adhere to accepted professional standards.”

This type of intimidation need not be, and generally is not, so explicit. The same message can be sent through many subtle and insidious channels which are equally effective.  It is goal conflict of another stripe - we refer to it as “organizational stress” - where the organizational interests of individuals - promotions, performance appraisals, work assignments, performance incentives, etc. - create another dimension of tension in achieving safety priority.  It is just as real and a lot more personal than the larger goal conflicts of cost and schedule pressures.


*  Defense Nuclear Facilities Safety Board, Recommendation 2011-1 to the Secretary of Energy "Safety Culture at the Waste Treatment and Immobilization Plant" (Jun 9, 2011).

Thursday, May 26, 2011

Upper Big Branch 1

A few days ago the Governor’s Independent Investigation Panel issued its report on the Upper Big Branch coal mine explosion of April 5, 2010.  The report is over 100 pages and contains considerable detail on the events and circumstances leading up to the disaster, coal mining technology and safety issues.  It is well worth reading for anyone in the business of assuring safety in a complex and high risk enterprise.  We anticipate doing several blog posts on material from the report but wanted to start with a brief quote from the forward to the report, summarizing its main conclusions.

“A genuine commitment to safety means not just examining miners’ work practices and behaviors.  It means evaluating management decisions up the chain of command - all the way to the boardroom - about how miners’ work is organized and performed.”*

We believe this conclusion is very much on the mark for safety management and for the safety culture that supports it in a well managed organization.  It highlights what to us has appeared to be an over-emphasis in the nuclear industry on worker practices and behaviors - and “values”.   And it focuses attention on management decisions - decisions that maintain an appropriate weight to safety in a world of competing priorities and interests - as the sine qua non of safety.  As we have discussed in many of our posts, we are concerned with the emphasis by the nuclear industry on safety culture surveys and training in safety culture principles and values as the primary tools of assuring a strong safety culture.  Rarely do culture assessments focus on the decisions that underlie the management of safety to examine the context and influence of factors such as impacts on operations, availability of resources, personnel incentives and advancement, corporate initiatives and goals, and outside factors such as political pressure.  The Upper Big Branch report delves into these issues and builds a compelling basis for the above conclusion, a conclusion that is not limited to the coal industry.


*  Governor’s Independent Investigation Panel, “Report to the Governor: Upper Big Branch,” National Technology Transfer Center, Wheeling Jesuit University (May 2011), p. 4.

Thursday, May 19, 2011

Mental Models and Learning

A recent New York Times article on teaching methods* caught our eye.  It reported an experiment by college physics professors to improve their freshmen students’ understanding and retention of introductory material.  The students comprised two large (260+) classes that usually were taught via lectures.  For one week, teaching assistants used a collaborative, team-oriented approach for one of the classes.  Afterward, this group scored higher on the test than the group that received the traditional lecture.  

One of the instructors reported, “. . . this class actively engages students and allows them time to synthesize new information and incorporate it into a mental model . . . . When they can incorporate things into a mental model, we find much better retention.”

We are big believers in mental models, those representations of the world that people create in their minds to make sense of information and experience.  They are a key component of our system dynamics approach to understanding and modeling safety culture.  Our NuclearSafetySim model illustrates how safety culture interacts with other variables in organizational decision-making; a primary purpose for this computer model is to create a realistic mental model in users’ minds.

Because this experiment helped the students form more useful mental models, our reaction to it is generally favorable.  On the other hand, why is the researchers’ “insight” even news?  Why wouldn’t a more engaging approach lead to a better understanding of any subject?  Don’t most of you develop a better understanding when you do the lab work, code your own programs, write the reports you sign, or practice decision-making in a simulated environment?

*  B. Carey, “Less Talk, More Action: Improving Science Learning,” New York Times (May 12, 2011).

Tuesday, May 10, 2011

Shifting the Burden

Pitot tube
This post emanates from the ongoing investigations of the crash of Air France flight 447 from Rio de Janeiro to Paris.  In some respects it is a follow-up to our January 27, 2011 post on Air France’s safety culture.  An article in the New York Times Sunday Magazine* explores some of the mysteries surrounding the loss of the plane in mid-Atlantic.  One of the possible theories for the crash involves the pitot tubes used on the Airbus plane.  Pitot tubes are instruments used on aircraft to measure air speed.  The pitot tube measures the difference between total (stagnation) and static pressure to determine dynamic pressure and therefore velocity of the air stream.  Care must be taken to assure that the pitot tubes do not become clogged with ice or other foreign matter as it would interrupt or corrupt the airspeed signal provided to the pilots and the auto-pilot system. 

On the flight 447 aircraft, three Thales AA model pitot tubes were in use.  They are produced by a French company and cost approximately $3500 each.  The Times article goes on to explain:

"...by the summer of 2009, the problem of icing on the Thales AA was known to be especially common….Between 2003 and 2008, there were at least 17 cases in which the Thales AA had problems on the Airbus A330 and its sister plane, the A340.  In September 2007, Airbus issued a ‘service bulletin’ suggesting that airlines replace the AA pitots with a newer model, the BA, which was said to work better in ice.”

Air France’s response to the service bulletin established a policy to replace the AA tubes “only when a failure occurred”.  A year later Air France then asked Airbus for “proof” that the model BA tubes worked better in ice.  It took Airbus another 6-7 months to perform tests that demonstrated the superior performance of the BA tubes, following which Air France proceeded with implementing the recommended change for its A330 aircraft.  Unfortunately the new probes had not yet been installed at the time of flight 447.

Much is still unknown about whether in fact the pitot tubes played a role in the crash of flight 447 and of the details of Air France’s consideration of deploying replacements.  But there is a sufficient framework to pose some interesting questions regarding how safety considerations were balanced in the process, and what might be inferred about the Air France safety culture.  Most clearly it highlights how fundamental the decision making process is to safety culture.

What is clear is that Air France’s approach to this problem “shifted the burden” from assuring that something was safe to proving that it was unsafe.  In legal usage this involves transferring the obligation to prove a fact in controversy from one party to another.  Or in systems thinking (which you may have noticed we strongly espouse) it denotes a classic dynamic archetype - a problem arises, it can be ameliorated through either a short term, symptom based response or a fundamental solution that may take additional time and/or resources to implement.  Choosing the short term fix provides relief and reinforces the belief in the efficacy of the response.  Meanwhile the underlying problem goes unaddressed.  For Air France, the service bulletin created a problem.  Air France could have immediately replaced the pitot tubes or undertaken its own assessment of pitot tubes with replacement to follow.  This would have taken time and resources.  Nor did Air France appear to try to address the threshold question of whether the existing AA model instruments were adequate - in nuclear industry terms, were they “operable” and able to perform their safety function?  Air France apparently did not even implement interim measures such as retraining to improve pilot’s recognition and response to pitot tube failures or incorrect readings.  Instead, Air France shifted the burden back to Airbus to “prove” their recommendation.  The difference between showing that something is not safe versus that it is safe is as wide as, well, the Atlantic Ocean.

What we find particularly interesting about shifting the burden is that it is just another side of the complacency coin.  Most people engaged in safety culture science recognize that complacency is a potential contributor to the decay and loss of effectiveness of safety culture.  Everything appears to be going OK so there is less need to pursue issues, particularly those lacking safety impact clarity.  Not pursuing root causes, not verifying corrective action efficacy, loss of questioning attitude and lack of resources could all be telltale signs of complacency.  The interesting thing about shifting the burden is that it yields much the same result - but with the appearance that action is being taken. 

The footnote to the story is the response of Air Caraibes to similar circumstances in this time frame.  The Times article indicates Air Caraibes experienced two “near misses” with Thales AA pitot tubes on A330 aircraft.  They immediately replaced the parts and notified regulators.


*  W.S. Hylton, "What Happened to Air France Flight 447?" New York Times Magazine (May 4, 2011).