Thursday, May 29, 2014

A Systems View of Two Industries: Nuclear and Air Transport

We have long promoted a systems view of nuclear facilities and the overall industry.  One consequence of that view is an openness to possible systemic problems as the root causes of incidents in addition to searching for malfunctioning components, both physical and human.

One system where we see this openness is the air transport industry—the air carriers and the Federal Aviation Administration (FAA).  The FAA has two programs for self-reporting of incidents and problems: the Voluntary Disclosure Reporting Program (VDRP) and the Aviation Safety Action Program (ASAP).  These programs are discussed in a recent report* by the FAA’s Office of Inspector General (OIG) and are at least superficially similar to the NRC’s Licensee Event Reporting and Employee Concerns Program.

What’s interesting is that VDRP is receptive to the reporting of both individual and systemic issues.  The OIG report says the difference between individual and systemic is “important because if the issue is systemic, the carrier will have to develop a detailed fix to address the system as a whole—whereas if the issue is more isolated or individual, the fix will be focused more at the employee level, such as providing counseling or training.” (p. 7)  In addition, it appears both FAA programs  are imbued with the concept of a “just culture,” another topic we have posted about on several occasions and which is often associated with a systems view.  A just culture is one where people are encouraged to provide essential safety-related information, the blame game is aggressively avoided, and a clear line exists between acceptable and unacceptable behavior.

Now the implementation of the FAA programs is far from perfect.  As the OIG points out, the FAA doesn't ensure root causes are identified or corrective actions are sufficient and long-lived, and safety data is not analyzed to identify trends that represent risks.  Systemic issues may not always be reported by the carriers or recognized by the FAA.  But overall, there appears to be an effort at open, comprehensive communication between the regulator and the regulated.

So why does the FAA encourage a just culture while the nuclear industry seems fixated on a culture of blame?  One factor might be the NRC’s focus on hardware-centric performance measures.  If these are improving over time, one might infer that any incidents are more likely caused by non-hardware, i.e., humans. 

But perhaps we can gain greater insight into why one industry is more accepting of systemic issues by looking at system-level factors, specifically the operational (or actual) coupling among industry participants versus their coupling as perceived by external observers.**

As a practical matter, the nuclear industry is loosely coupled, i.e., each plant operates more or less independently of the others (even though plants with a common owner are subject to the same policies as other members of the fleet).  There is seldom any direct competition between plants.  However, the industry is viewed by many external observers, especially anti-nukes, as a singular whole, i.e, tightly coupled.  Insiders reinforce this view when they say things like “an accident at one plant is an accident for all.”  And, in fact, one incident (e.g., Davis-Besse) can have industry-wide implications although the physical risk may be entirely local.  In such a socio-political environment, there is implicit pressure to limit or encapsulate the causes of any incidents or irregularities to purely local sources and avoid the mention of possible systemic issues.  The leads to a search for the faulty component, the bad employee, a failure to update a specific procedure or some other local problem that can be fixed by improved leadership and oversight, clearer expectations, more attention to detail, training etc.  The result of this approach (plus other industry-wide factors, e.g., the lack of transparency in certain oversight practices*** and the “special and unique” mantra) is basically a closed system whose client, i.e., the beneficiary of system efforts, is itself.

In contrast, the FAA’s world has two parts, the set of air carriers whose relationship with each another is loosely coupled, similar to the nuclear industry, and the air traffic control (ATC) sub-system, which is more tightly coupled because all the carriers share the same airspace and ATC.  Because of loose coupling, a systemic problem at a single carrier affects only that carrier and does not infect the rest of the industry.  What is most interesting is that a single airline accident (in the tightly coupled portion of the system) does not lead to calls to shut down the industry.  Air transport has no organized opposition to its existence.  Air travel is such an integral part of so many people’s lives that pressure exists to keep the system running even in the face of possible hazards.  As a consequence, the FAA has to occasionally reassert its interest in keeping safety risks from creeping into the system.  Overall, we can say the air transport industry is relatively open, able to admit the existence of problems, even systemic ones, without taking an inadvertent existential risk. 

The foregoing is not intended to be a comprehensive comparison of the two industries.  Rather it is meant to illustrate how one can apply a simple systems concept to gain some insights into why participants in different industries behave differently.  While both the FAA and NRC are responsible for identifying systemic issues in their respective industries, it appears FAA has an easier time of it.  This is not likely to change given the top-level factors described above. 


*  FAA Office of Inspector General, “Further Actions are Needed to Improve FAA’s Oversight of the Voluntary Disclosure Reporting Program” Report No. AV-2014-036 (April 10, 2014).  Thanks to Bill Mullins for pointing out this report to us.

“VDRP provides air carriers the opportunity to voluntarily report and correct areas of non-compliance without civil penalty. The program also provides FAA important safety information that might not otherwise come to its attention.“ (p. 1)  ASAP “allows individual aviation employees to disclose possible safety violations to air carriers and FAA without fear that the information will be used to take enforcement or disciplinary action against them.” (p. 2)

**  “Coupling” refers to the amount of slack, buffer or give between two items in a system.

***  For example, INPO’s board of directors is comprised of nuclear industry CEOs, INPO evaluation reports are delivered in confidence to its members and INPO has basically unfettered access to the NRC.  This is not exactly a recipe for gaining public trust.  See J.O. Ellis Jr. (INPO CEO), Testimony before the National Commission on the BP Deepwater Horizon Oil Spill and Offshore Drilling (Aug. 25, 2010).  Retrieved from NEI website May 27, 2014.

Thursday, May 22, 2014

GM Part 3 - Lawyers, Decision Making and...Simulation?

The GM story continues to unfold on a daily basis.  We’ve already lost track of the number of recalls as it appears that any and every possible safety defect from prior years has been added to the recall list.  This is reminiscent of the “problem” nuclear plants in the 1990s - NRC mandated improvement programs precipitated an avalanche of condition reports into the plants’ Corrective Action Programs, requiring immense resources and time to sort out and prioritize the huge volume of issues.

In our prior post on GM product safety issues, we critiqued the structure of management’s independent review being conducted by attorney Anton Valukas based in part on the likelihood that GM’s legal department would be a subject of the review.  Asking the chairman of a law firm with a long standing relationship with GM, to pull this off seemed, at a minimum, to be unnecessary, and potentially could undermine the credibility of the assessment.  Now we see in further reporting of the GM issues by the New York Times* that in fact GM’s lawyers are becoming a key focus of the investigation.  The implication is that GM’s lawyers may have been the gate keepers on information related to the Cobalt ignition switches and/or been enablers of a decision process that did not result in aggressive action.

Of greater interest is the Consent Order** entered into by GM and the United States Department of Transportation, National Highway Traffic Safety Administration.  The headline was the $35 million civil penalty but there were more interesting nuggets within the order.  Among a series of required actions by GM to improve timeliness and data to support safety defect evaluations were three actions specifically focusing on safety decision making.  One is to ensure that safety issues are expeditiously brought to the attention of “committees and individuals with authority to make safety recall decisions.” (p. 10)***  Second, GM will have to meet with the NHTSA on a monthly basis for one year to review its decision making on potential safety issues.  And third,

“GM shall meet with NHTSA no later than 120 calendar days after execution of this Consent Order to conduct simulations—i.e., an exercise to discuss hypothetical scenarios, for the purpose of assessing the effectiveness of the improvements [in processes and analytics to identify safety-related defects]…” (p. 9, emphasis added)  We find the emphasis of the Consent Order both fascinating and appropriate.  It emphasizes decision making - the process, timeliness, engagement of appropriate participants, and transparency - as essential to assuring appropriate outcomes.  It opens that process to scrutiny by the NHTSA through monthly reviews of actual decisions.  And most strikingly, it requires the conduct of decision simulations to verify the effectiveness of the improvements.

The provisions of the Consent Order establish a fundamentally new and better approach to rectifying deficiencies in safety performance and are consistent with themes we have been advocating for some time.  It departs from the simplistic - blame some individuals, reinforce expectations, emphasize values and improve processes - catechism that is pursued within the nuclear industry and others as well.  It seems to recognize that safety related decisions constitute the essence of assuring safety.  Rather than just reviewing and investigating bad outcomes, the Consent Order opens the door to making the results of all ongoing decisions transparent and reviewable.  Further it even calls for practicing the decision making process - through simulations - to verify the effectiveness of the process and the results.  Practicing complex and nuanced safety decisions to improve the process and decision making skills - what an idea.

It is no news flash to our readers that we have not only advocated these approaches, we have developed prototype tools for these purposes.  We have made the NuclearSafetySim simulation tool available for almost a year via this blog and linked to its website.  What has been the result?  While it is clear there have been many viewings of these materials, there has not been a single inquiry or follow-up by the nuclear industry, the NRC or INPO.****  At the same time there have been no initiatives within those groups to develop new or improved tools and methods for improving safety management.  Why?


*  B. Vlasic, “Inquiry by General Motors Is Said to Focus on Its Lawyers,” New York Times (May 17, 2014).  Retrieved May 22, 2014. 

**  Consent Order between the National Highway Traffic Safety Administration and General Motors Company re: NHTSA’s Timeliness Query TQ14-001 (May 16, 2014).

***  Including GM’s Executive Field Action Decision Committee and Field Performance Evaluation Recommendation Committee. (p. 9)

****  Ironically, the only serious interest has been expressed within the oil/gas industry which appears much more open to exploring innovative approaches.

Monday, May 19, 2014

GM Part 2

In our April 16, 2014 post we discussed the evolving situation at General Motors regarding the issues with the Chevy Cobalt’s ignition switches.  We highlighted the difficulties GM was encountering in piecing together how decisions were made regarding re-design and possible vehicle recalls, and who in the management chain was involved and/or aware of the issues.  As we noted, GM had initiated an internal investigation of the matter with the results expected by late May.

In a recent Wall Street Journal article* there is some further perspective on how things are moving forward.  For one, the GM Board has now instituted its own investigation of how information flowed to the Board and how it affected its oversight function.  An outside law firm is conducting that investigation.

Perhaps of more interest are some comments in the article regarding the separate investigation being conducted on behalf of GM’s management.  It is being conducted by a former U.S. attorney, Anton Valukas, who also happens to be Chairman of the law firm Jenner & Block.  The WSJ article notes “some governance experts have questioned whether Mr. Valukas has enough of an arm's-length relationship with GM management. Jenner & Block has long advised GM management.”  It does seem to raise a basic conflict of interest issue, providing legal services to GM and conducting an independent investigation.  But a source quoted in the WSJ article notes that GM does not see a problem since “Mr. Valukas' own integrity is on the line…”

In terms of the specific situation it seems fairly clear to us that Valukas should not be performing the investigation on behalf of management.  The Board of Directors should have initiated the primary investigation using an independent outside firm - essentially what it has now done but which is limited to the narrow issue of information flow to the Board.  Having current management sponsor an investigation of itself using a firm with commercial ties to GM will not result in high confidence in its findings.

In a broader sense this situation models the contours of a wider problem associated with ensuring safety in complex organizational systems.  In the GM case the assurance of a completely objective and thorough investigation seems to come down to the personal integrity of Mr. Valukas.  While we have no reason to doubt his credentials or integrity, he is being placed in a situation where an aggressive investigation could have negative impacts on GM and its management - who are clients of Mr. Valukas’ law firm.  In addition this investigation will involve products liability issues which inevitably involve GM’s internal lawyers; in all probability Valukas’ firm has professional relationships with these lawyers making it a particularly sensitive situation.  It is certainly possible that Mr. Valukas will be immune to any implicit pressures due to these circumstances, but it is an approach that puts maximum reliance on the individual to do the “right” thing notwithstanding competing interests.  And in any event, the perception of an investigation of this type will always be subject to some question where conflicts are present.

We also see an interesting analogy to nuclear operations where the reliance on safety culture is in essence, reliance on personal integrity.  We are not implying there is anything wrong to expect and emphasize personal integrity, however all too often it becomes a panacea for countering significant costs or other impacts to operations and ensuring safety is accorded proper priority.  And if things go wrong, it is the norm that individuals are blamed and often, replaced.  In essence they failed the integrity test.  Why they failed, the elephant in the room, is hardly ever pursued.  Rarely if ever do corrective actions address minimizing or eliminating the influence of those conflicts, leaving the situation ripe for further failures.


*  J.S. Lublin and J. Bennett, “GM Directors Ask Why Cobalt Data Didn't Reach Them,” Wall Street Journal (May 14, 2014).

Monday, May 12, 2014

Willful Violations at Indian Point

We report in this post on a situation that developed at Indian Point more than two years ago and was just recently closed out via NRC notices of violation to an individual (a Chemistry Manager for Entergy Nuclear Operations) and to Entergy Nuclear Operations itself. 

What should we make of another willful misconduct episode?  A misguided individual who made some bad choices but where the actual impact on safety (per Entergy and the NRC) was not significant?  The individual resigned (and plead to a felony conviction and probation), corrective actions to reinforce proper behaviors have been taken, and violations issued...what difference does it make?

The Events Surrounding the Misconduct

We are attaching a series of references as they contain more detail than we can recount in a blog post.  In particular Reference 4 provides the most comprehensive rendition of the relevant events.  Very briefly this is what occurred: During 2011 routine testing of diesel fuel oil at Indian Point (IP), as required by Tech Specs, indicated that the limits on particulate concentration were exceeded.  The Chemistry Manager with responsibility for this testing did not report (initiate Condition Reports) the anomalous results which would have resulted in the reserve fuel oil storage tank (RFOST) being declared inoperable.  The LCO is 30 days and if operability was not restored, shutdown of both IP units would have been required. [Ref 2, Cover Letter]  In early 2012 as part of a systems engineering self-assessment, the anomalous results and lack of reporting were identified.  The Chemistry Manager falsely indicated that re-sampling and testing had been performed which were acceptable.  He subsequently made false data entries to support this story.

A short time later employee concerns were filed via the Entergy Ethics Line and the Employee Concerns Program (ECP).  Entergy initiated an investigation using outside attorneys (Morgan Lewis).  At the same time the NRC initiated an Office of Investigations (OI) investigation.  The Chemistry Manager refused to cooperate in the investigation and resigned.  Subsequent testing of the fuel oil indicated limits were being exceeded and compensatory actions were taken.  Pursuant to the investigations the Chemistry Manager admitted willful misconduct.  The US Attorney issued a criminal complaint and ultimately the manager plead to a felony and received probation.  Entergy was cited for a Severity Level III violation, civil penalty waived.

Further Observations

Plowing through the documentation of this issue left us with a few lingering questions.  One is with regard to the sanitized LER that Entergy submitted to the NRC in August 2012.  The LER makes no mention of the filing of employee concerns, investigation by outside attorneys or the NRC OI investigation.  For that matter the LER never mentions that the cause of the event was willful misconduct by a department manager.  Rather it characterizes the situation in the abstract - as a failure to use the corrective action program.  In other words a whole lot was happening in the background which would cast the event in a different light, including its potential significance.*

While the cited violations are linked to the misconduct of the Chemistry Manager, it appears there had been ongoing issues within the Chemistry Department for some time: entering test data diligently, understanding the significance of the data, and initiating CRs.  “The circumstances surrounding the violations are of concern to the NRC because they indicate a lack of consideration for (and/or knowledge of) TS requirements by ENO Chemistry staff.  The NRC also noted that the Chemistry Manager would not have had the opportunity to commit the violations had ENO staff exhibited the proper regard for the site TS.”  [Ref 4, p. 4]  But in its chronology of events, Entergy contends that in March 2102 there was “no reason to question the integrity of former Chemistry Manager…” [Ref 4, Encl 2, slide 15].  Perhaps not the integrity, but what about management effectiveness? 

Further context.  Entergy gives itself credit for how it responded to the evolving situation.  They highlight that a self-assessment team identified the anomalies (true), that employees raised concerns through established programs (true), that Entergy conducted an investigation (true).  [Ref 4, Encl 2, slide 35]  But what is missing is that normal business processes (management oversight, QA audits, or Chemistry Department personnel) did not identify the anomalies prior to the self-assessment; that employees felt the need to use the Ethics Line and the ECP rather than directly raising within the management chain; that upon discovery of the anomalies, it appears that Entergy went to great lengths to avoid declaring that the fuel oil did not meet specs.**  The net result is that the RFOST was able to be maintained as operable for almost three months before definitive action was taken to filter the oil. [Ref 4, Encl 2, slides 17-21]

Why?

The most interesting and relevant question posed by these events is why did the Chemistry Manager take the actions he did?  “The Manager said that he falsified the data because he needed more time to prove his theory [that the IP Chemistry Department’s sampling practices were poor] and incorporate new test methods, and he had not wanted the plant to unnecessarily shut down.”  [Ref 2, Encl 1] That is the extent of what the NRC reports on its investigation of the motive of the Chemistry Manager.  An employee for 29 years undertakes a series of deliberate violations of his professional responsibilities “to prove his theory”.  Perhaps. 

One of the final corrective actions implemented for this event occurred in December 2013 when the General Manager for Plant Operations briefed the Department Managers on deliberate misconduct.  Included was a statement, "If we have to shutdown the plant we will do so". [Ref 4, Encl 2, slide 32] Without reading too much into a single bullet point, one wonders if this is a tacit acknowledgment by Entergy that the Chemistry Manager may have been influenced to do what he did because he did not want to be the cause of a plant shutdown.

We would be very interested to see how much probing was done by the NRC investigators, or Entergy’s attorneys, of this individual’s motive, particularly in terms of any perceived pressure to keep the plant operating.  Such pressure needn’t come from Entergy, it seems self-evident that Indian Point’s licensing situation and the long standing political opposition within New York State poses an existential threat to the plant.  If his motive was just a matter of a revised test “theory”, were these the first out-of-spec fuel oil test results on his watch?  If there had been others, how were they handled?  How long had he been in the position?  Had he initiated any other actions prior to this time to investigate the testing protocol?  As we noted in our post dated September 12, 2013 regarding the NRC’s Information Notice on willful violations, in none of the cited examples did the NRC provide any perspective on the motives of the individuals or the potential effects of the environment within which they were working.

Safety and Safety Culture

How does all of this shed any light on safety and safety culture? 

A key dimension of safety culture is the accurate assessment of safety significance.  The position of Entergy, and adopted by the NRC***, was that the actual impact of the violations on reactor safety was not significant. [Ref 4, Encl 2, slide 36]  Also note that NRC finds that all of this is in the ROP category for “green” significance. The argument is a familiar one.  TS limits are conservative and below what is actually “OK”.  And if particulates are a problem there are filters on the diesel generators, and these can be changed out during operation of the diesels if necessary.  This is a familiar characterization - safety significance is evaluated within the strict boundaries of the NRC’s safety construct of design basis assumptions, almost exclusively hardware based.  As we noted in our September 24, 2013 post, this ignores the larger environment and “system” within which people actually function. 

The Synergy Safety Culture Survey conducted from Feb to April 2012 is cited as finding a “healthy work environment in Chemistry Department” - yet this was at the very time test results were being falsified by the manager and employees were resorting to the ECP to raise issues.  Other assessments by the NRC and INPO also did not identify issues. [Ref 4, Encl 2, slide 29].  There is reference to an “independent investigation” of the employee concerns but the documentation does not reveal who did the investigation or its findings.  The investigation found “no one interviewed” had a reluctance to raise an issue.  Nowhere is the prior use of the Ethics Line and ECP by several individuals on an anonymous basis explained. 

Something that is hard to square is the NRC assertion that there is a strong link between willful violations and safety culture, and the results of these various assessments at Indian Point by Synergy, the NRC and INPO.  So if there is a link, and safety culture assessments don’t reveal its presence, are the assessments valid?  Or if the assessments are valid, is there really a link with willful misconduct? 

Here’s our take.  Willful misconduct is an indication of an issue with the safety culture.  But the issue arises out of a broader and more complex context than the NRC or industry is willing to address.  At Indian Point there is an overriding operating context where the extension of the plants’ operating licenses is being contested by powerful political forces in New York State.  If the licenses are not extended, the plants close and people lose their jobs.  This is not theoretical as the Entergy-owned plant, Vermont Yankee, is doing just that.  If you are an employee at Indian Point, you must feel that pressure every day.  When an issue comes up such as failed diesel fuel tests that could result in temporary shutdown of both units, it is an additional threat to the viability of the plant.  That pressure can create a powerful desire to rationalize the fuel tests are not valid and/or that slightly contaminated fuel isn’t a significant safety concern because…[see Entergy and NRC agreement that it is not a significant safety concern].  So there is a situation where there is an immediate and significant penalty (shutdown of both units) versus a test result that may or may not be valid or of real safety significance.  The result: deliberate misconduct in burying the test results but also very possibly (I am speculating) the individual and others in the organization can still believe that safety is not impacted.  As actions are consistent with “real” safety significance, it preserves the myth that safety culture is still healthy.


*  As stated in the NRC Enforcement Policy (on page 9, section 2.2.1.d): “Willful violations are of particular concern because the NRC’s regulatory program is based on licensees and their contractors, employees, and agents acting with integrity and communicating with candor. The Commission cannot tolerate willful violations. Therefore, a violation may be considered more significant than the underlying noncompliance if it includes indications of willfulness.” [NRC Information Notice 2013-15]

**  The sequence of events starting in March 2012 in response to RFOST sample (by off-site testing lab) being out of spec: the RFOST is declared inoperable but a supervisor declares that the sample test method was not appropriate, the department procedure is revised to allow on-site testing of a new sample (what was site review process? procedure revision appears to have occurred and become effective in one day), and the test results are now found acceptable.  This allows the RFOST to be declared operable. Without telling anyone, the former Chem Mgr sends a split sample for off-site testing and it comes back over spec.  Why wouldn’t plant management have required a split sample in the first place to verify on-site test?  Two employee concerns are filed, the ML investigation is initiated and the Chemistry Manager resigns.  At the next sampling in mid-April, once again the on-site analysis finds the sample to be within spec but management now requires outside testing in light of the resignation of the Chemistry Manager.  Outside test indicates out-of-spec but an “evaluation” concludes that the in-house results are valid and  RFOST remains “operable”.  Another month goes by and sample is taken in late May.  Sample sent outside, late June results indicate out-of-spec.  This time the RFOST is declared inoperable.  Not clear if late May sample was tested on-site (or why not) and why this time the outside test result is deemed valid.  A final footnote, one of the corrective actions for this event was to discontinue on-site oil analysis but no discussion of why, or why it had been approved in the first place.

***  “the underlying technical findings would have been evaluated as having very low safety significance (i.e. green) under the Reactor Oversight Process (ROP) because the higher fuel oil particulate concentration would not have impacted the ability of the EDGs to fulfill their safety function.” [Ref 4, p. 3]

References

1 - J.A. Ventosa (Entergy) to NRC, Licensee Event Report # 2012-007-00 (Aug. 20, 2012).  ADAMS ML12235A541.

2 - NRC to J. Ventosa, NRC Inspection Report Nos. 05000247/2013011 & 05000286/2013011 and NRC Office of Investigation Reports No. 1-2012-036 (Dec. 18, 2013)  ADAMS ML13354B806.

3 - NRC to D. Wilson (former Chemistry Mgr.), Notice of Violation and Order Prohibiting Involvement in NRC-Licensed Activities (April 29, 2014).  ADAMS ML14118A337.

4 - NRC to J. Ventosa, Notice of Violation (April 29, 2014).  ADAMS ML14118A124.

Monday, May 5, 2014

WIPP - Release the Hounds

(Ed. note: This is Safetymatters’ second post on the Phase 1 WIPP report.  Bob and I independently saw the report, concluded it raised important questions about DOE and its investigative process and headed for our keyboards.  We will try to get an official response to our posts—but don’t hold your breath.) 

Earlier this week the DOE released its Accident Investigation Report on the Radiological Release Event at the Waste Isolation Pilot Plant.  The report is a prodigious effort in the just over two months since the event.  It is also a serious indictment of DOE’s management of WIPP and arguably, the DOE itself.  There is however a significant flaw in the investigation and report: the investigators were kept on too tight a leash.  Itemization of failures, particularly pervasive failures, without pursuing how and why they occurred is not sufficient.  It also highlights the essence and value of systems analysis - identifying the fundamental dynamics that produced the failures and solutions that change those dynamics.

At first blush the issuance of yet another report on safety issues and safety management performance at a DOE facility would hardly merit a rush to the keyboard to dissect the findings.  Yet we believe this report is a tipping point in the pervasive and continuing issues at DOE facilities and should be a call for much more aggressive action.  It doesn’t take long for the report to get to the point in the Executive Summary:

“The Board identified the root cause of Phase 1 of the investigation of the release of radioactive material from underground to the environment to be NWP’s and CBFO’s management failure to fully understand, characterize, and control the radiological hazard.” [emphasis added] (p. ES-6)  NWP is Nuclear Waste Partnership, the contractor with direct management responsibility for WIPP operations, and CBFO is the Carlsbad Field Office of the DOE.

To complete the picture the investigation board also found as a contributing cause, that DOE Headquarters oversight was ineffective.  So in sum, the board found a total failure of the management system responsible for radiological safety at the WIPP. 

Interestingly there has been a rather muted response to this report.  The DOE issued the report with a strikingly neutral press release quoting Matt Moury, Environmental Management Deputy Assistant Secretary, Safety, Security, and Quality Programs: “The Department believes this detailed report will lead WIPP recovery efforts as we work toward resuming disposal operations at the facility.”  And Joe Franco, DOE’s Carlsbad Field Office Manager: “We understand the importance of these findings, and the community’s sense of urgency for WIPP to become operational in the future.”*  (We note that both statements focus on resumption of operations versus correction of deficiencies.)  New Mexico’s U.S. Senators Udall and Heinrich called the findings “deeply troubling” but then simply noted that they expected DOE management to take the necessary corrective actions.**  If there is any sense of urgency we would think it might be directed at understanding how and why there was such a total management failure at the WIPP.

To fully appreciate the range and depth of failures associated with this event one really needs to read the board’s report.  Provided below is a brief summary of some of the highlights that illustrate the identified issues:

-    Implementation of the NWP Conduct of Operations Program is not fully compliant with DOE policy;
-    NWP does not have an effective Radiation Protection Program in accordance with 10 Code of Federal Regulations (CFR) 835, Occupational Radiation Protection;
-    NWP does not have an effective maintenance program;
-    NWP does not have an effective Nuclear Safety Program in accordance with 10 CFR 830 Subpart B, Safety Basis Requirements;
-    NWP implementation of DOE O 151.1C, Comprehensive Emergency Management System, was ineffective;
-    The current site safety culture does not fully embrace and implement the principles of DOE Guide (G) 450.4-1C, Integrated Safety Management Guide [note: findings consistent with findings of the 2012 SCWE self assessment results]; and DOE oversight of NWP was ineffective;
-    Execution of CBFO oversight in accordance with DOE O 226.1B was ineffective; and
-    As previously mentioned, DOE Headquarters (HQ) line management oversight was ineffective. (pp. ES 7-8)

Many of the specific deficiencies cited in the report are not point in time occurrences but stem from chronic and ongoing weaknesses in programs, personnel, facilities and resources. 

Losing the Scent

As mentioned in the opening paragraph we feel that while the report is of significant value it contains a shortcoming that will likely limit its effectiveness in correcting the identified issues.  In so many words the report fails to ask “Why?”  The report is a massive catalogue of failures yet never fully pursues the ultimate and most relevant question: Why did the failures occur?  One almost wonders how the investigators could stop short of systematic and probing interviews of key decision makers.

For example in the maintenance area, “The Board determined that the NWP maintenance and engineering programs have not been effective…”; “Additionally, configuration management was not being maintained or adequately justified when changes were made.”; “There is an acceptance to tolerate or otherwise justify (e.g., lack of funding) out-of-service equipment.” (p. 82)  And that’s where the analysis stops. 

Unfortunately (but predictably) what follows from the constrained analysis are equally unfocused corrective actions based on the following linear construct: “this is a problem - fix the problem”.  Even the corrective action vocabulary becomes numbingly sterile: “needs to take action to ensure…”, “needs to improve…”, “need to develop a performance improvement plan…”,  “needs to take a more proactive role…”.

We do not want to be overly critical as the current report reflects a little over two months of effort and may not have afforded sufficient time to pull the string on so many issues.  But it is time to realize that these types of efforts are not sufficient to understand, and therefore ultimately correct, the issues at WIPP and DOE and institutionalize an effective safety management system.


*  DOE press release, “DOE Issues WIPP Radiological Release Investigation Report” (April 24, 2014)  Retrieved May 5, 2014.

**  Senators Udall and Heinrich press release, “Udall, Heinrich Statement on Department of Energy WIPP Radiological Release Investigation Report” (April 24, 2014).  Retrieved May 5, 2014.

Saturday, May 3, 2014

DOE Report on WIPP's Safety Culture

On Feb. 14, 2014, an incident at the Department of Energy (DOE) Waste Isolation Pilot Plant (WIPP) resulted in the release of radioactive americium and plutonium into the environment.  This post reviews DOE’s Phase 1 incident report*, with an emphasis on safety culture (SC) concerns.

From the Executive Summary

The Accident Investigation Board (the Board) concluded that a more thorough hazard analysis, coupled with a better filter system could have prevented the unfiltered above ground release. (p. ES-1)

The root cause of the incident was Nuclear Waste Partnership’s (NWP**, the site contractor) and the DOE Carlsbad Field Office’s (CBFO) failure to manage the radiological hazard. “The cumulative effect of inadequacies in ventilation system design and operability compounded by degradation of key safety management programs and safety culture [emphasis added] resulted in the release of radioactive material . . . and the delayed/ineffective recognition and response to the release.” (pp. ES 6-7)

The report presents eight contributing causes, most of which point to NWP deficiencies.  SC was included as a site-wide concern, specifically the SC does not fully implement DOE safety management policy, “[t]here is a lack of a questioning attitude, reluctance to bring up and document issues, and an acceptance and normalization of degraded equipment and conditions.”  A recent Safety Conscious Work Environment (SCWE) survey suggests a chilled work environment. (p. ES-8)

The report includes 31 conclusions, 4 related to SC.  “NWP and CBFO have allowed the safety culture at the WIPP project to deteriorate . . . Questioning attitudes are not welcomed by management . . . DOE has exacerbated the safety culture problem by referring to numbers of [problem] reports . . . as a measure of [contractor] performance . . . . [NWP and CBFO] failed to identify weaknesses in . . . safety culture.” (pp. ES 14-15, 19-20)

The report includes 47 recommendations (called Judgments of Need) with 4 related to SC.  They cover leadership (including the CBFO site manager) behavior, organizational learning, questioning attitude, more extensive use of existing processes to raise issues, engaging outside SC expertise and improving contractor SC-related processes. (ibid.)

Report Details

The body of the report presents the details behind the conclusions and recommendations.  Following are some of the more interesting SC items, starting with our hot button issues: decision making (esp. the handling of goal conflict), corrective action, compensation and backlogs. 

Decision Making

The introduction to section 5 on SC includes an interesting statement:  “In normal human behavior, production behaviors naturally take precedence over prevention behaviors unless there is a strong safety culture - nurtured by strong leadership.” (p. 61)

The report suggests nature has taken its course: WIPP values production first and most.  “Eighteen emergency management drills and exercises were cancelled in 2013 due to an impact on operations. . . .Management assessments conducted by the contractor have a primary focus on cost and schedule performance.” (p. 62)  “The functional checks on CAMs [continuous air monitors] were often delayed to allow waste-handling activities to continue.” (p. 64)  “[D]ue consideration for prioritization of maintenance of equipment is not given unless there is an immediate impact on the waste emplacement processes.” (p. ES-17)  These observations evidence an imbalance between the goals of production and prevention (against accidents and incidents) and, following the logic of the introductory statement, a weak SC.

Corrective Action

The corrective action program has problems.  “The [Jan. 2013] SCWE Self-Assessment . . . identified weaknesses in teamwork and mutual respect . . . Other than completing the [SCWE] National Training Center course, . . . no other effective corrective actions have been implemented. . . . [The Self-Assessment also ]“identified weaknesses in effective resolution of reported problems.” (p. 63)  For problems that were reported, “The Board noted several instances of reported deficiencies that were either not issued, or for which corrective action plans were not developed or acted on for months.” (p. 65)

Compensation

Here is the complete text of Conclusion 14, which was excerpted above: “DOE has exacerbated the safety culture problem by referring to numbers of ORPS [incident and problem] reports and other deficiency reporting documents, rather than the significance of the events, as a measure of performance by Source Evaluation Boards during contract bid evaluations, and poor scoring on award fee determinations.  Directly tying performance to the number of occurrence reports drives the contractor to non-disclosure of events in order to avoid the poor score. [emphasis added]  This practice is contrary to the Department’s goals of the development and implementation of a strong safety culture across our projects.” (p. ES-15)  ‘Nuff said. 

Backlogs

Maintenance was deferred if it interfered with production.  Equipment and systems were  allowed to degrade (pp. ES-7, ES-17, C-7)  There is no indication that maintenance backlogs were a problem; the work simply wasn’t done.

Other SC Issues

In addition to our Big Four and the issues cited from the Executive Summary, the report mentions the following concerns.  (A listing of all SC deficiencies is presented on p. D-3.)

  • Delay in recognizing and responding to events,
  • Bias for negative conclusions on Unreviewed Safety Question Determinations, and
  • Infrequent presence of NWP management in the underground and surface.
Our Perspective

For starters, the Board appears to have a limited view of what SC is.  They see it as a cause for many of WIPP's problems but it can be fixed if it is “nurtured by strong leadership” and the report's recommendations are implemented.  The recommendations are familiar and can be summed up as “Row harder!”***  In reality, SC is both cause (it creates the context for decision making) and consequence (it is influenced by the observed actions of all organization members, not just senior management).  SC is an organizational property that cannot be managed directly.  

The report is a textbook example of linear, deterministic thinking, especially Appendix E (46 pgs.) on events and causal factors related to the incident.  The report is strong on what happened but weak on why things happened.  Going through Appendix E, SC is a top-level blanket cause of nuclear safety program and radiological event shortcomings (and, to a lesser degree, ventilation, CAMs and ground control problems) but there is no insight into how SC interacts with other organizational variables or with WIPP’s external (political, regulatory, DOE policy) environment. 

Here’s an example of what we’re talking about, viz., how one might gain some greater insight into a problem by casting a wider net and applying a bit of systems thinking.  The report faults DOE HQ for ineffective oversight, providing inadequate resources and not holding CBFO accountable for performance.  The recommended fix is for DOE HQ “to better define and execute their roles and responsibilities” for oversight and other functions. (p. ES-21)  That’s all what and no why.  Is there some basic flaw in the control loop involving DOE HQ, CBFO and NWP?  DOE HQ probably believes it transmits unambiguous orders and expectations through its official documents—why weren’t they being implemented in the field and why didn’t DOE know it?  Is the information flow from DOE to CBFO to NWP clear and adequate (policies, goals); how about the flow in the opposite direction (performance feedback, problems)?  Is something being lost in the translation from one entity to another?  Does this control problem exist between DOE HQ and other sites, i.e., is it a systemic problem?  Who knows.****

Are there other unexamined factors that make WIPP's problems more likely?  For example, has WIPP escaped the scrutiny and centralized controls that DOE applies to other entities?  As a consequence, has WIPP had too much autonomy to adjust its behavior to match its perception of the task environment?  Are DOE’s and WIPP’s mental models of the task environment similar or even adequate?  Perhaps WIPP (and possibly DOE) see the task environment as simpler than it actually is, and therefore the strategies for handling the environment lack requisite variety.  Was there an assumption that NWP would continue the apparently satisfactory performance of the previous contractor?  It's obvious these questions do not specifically address SC but they seek to ascertain how the organizations involved are actually functioning, and SC is an important variable in the overall system.

Contrast with Other DOE SC Investigations 


This report presents a sharp contrast to the foot-dragging that takes place elsewhere in DOE.  Why can’t DOE bring a similar sense of urgency to the SC investigations it is supposed to be conducting at its other facilities?  Was the WIPP incident that big a deal (because it involved a radioactive release) or is it merely something that DOE can wrap its head around?  (After all, WIPP is basically an underground warehouse.)  In any event, something rang DOE’s bell because they quickly assembled a 5 member board with 16 advisor/consultants and produced a 300 page report in less than two months.*****

Bottom line: You don't need to pore over this report but it provides some perspective on how DOE views SC and demonstrates that a giant agency can get moving if it's motivated to do so.


*  DOE Office of Environmental Management, “Accident Investigation Report: Radiological Release Event at the Waste Isolation Pilot Plant on February 14, 2014, Phase 1” (April 2014).  Retrieved April 30, 2014.  Our thanks to Mark Lyons who posted this report on the LinkedIn Nuclear Safety group discussion board.

**  NWP LLC was formed by URS Energy and Construction, Inc. and Babcock & Wilcox Technical Services Group, Inc.  Their major subcontractor is AREVA Federal Services, LLC.  All three firms perform work at other, i.e., non-WIPP, DOE facilities.  NWP assumed management of WIPP on Oct. 1, 2012.  From NWP website.  Retrieved May 2, 2014.

***  To the Board's credit, they did not go looking for individual scapegoats to blame for WIPP's difficulties.

****  In fairness, the report has at least one example of a feedback loop in the CBFO-NWP sub-system: CBFO's use of the condition reports as an input to NWP’s compensation review and NWP's predictable reaction of creating fewer condition reports.

*****  The Accident Investigation Board was appointed on Feb. 27, 2014 and completed its Phase 1 investigation on March 28, 2014.  The Phase 1 report was released to the public on April 22, 2014.