Thursday, June 25, 2015

Safety Culture at Arkansas Nuclear One

Arkansas Nuclear One (credit: Edibobb)
Everyone has heard about the March 31, 2013 stator drop at Arkansas Nuclear One (ANO).  But there was also unsatisfactory performance with respect to flood protection and unplanned scrams.  As a consequence, ANO has been assigned to column 4 of the NRC’s Action Matrix where it will receive the highest level of oversight for an operating plant.

When a plant is in column 4 the NRC takes a particular interest in its safety culture (SC) and ANO is no exception.  NRC required ANO to have an independent (i.e., outside third party) SC assessment, which was conducted starting in late 2014.  While the assessment report is not public, some highlights were discussed during the May 21, 2015 NRC staff briefing of the Commissioners on the results of the April 15, 2015 Agency Action Review Meeting.*

NRC Presentation

The bulk of the staff presentation was a soporific review of agency progress in a variety of areas.  But when the topic turned to ANO, the Regional Administrator responsible for ANO was quite specific and minced no words.  Following are the key problems he reviewed.  See if you can connect the dots on SC issues based on these artifacts.

Let’s start with the stator drop.  ANO’s initial root cause evaluation did not identify any root or contributing causes related to ANO’s own performance, but rather focused solely on the contractor.  After the NRC identified ANO’s failure to follow its load handling procedure, ANO conducted another root cause evaluation and identified their own organizational performance issues such as inadequate project oversight and non-conservative decision making. (pp. 28-29)

The stator drop damaged a fire main which caused localized flooding.  This led to an extended condition review which identified various equipment and structures that could be subject to flooding.  The NRC inspectors pointed out deficiencies in the condition review and identified corrective actions that likely would not work.  In addition, earlier flooding walkdowns completed as part of the NRC’s post-Fukushima requirements failed to identify the majority of the flood protection deficiencies.  These walkdowns were also performed by a contractor.  (pp. 29-31)

Finally, ANO did not report an April 2014 Unit 2 trip as an unplanned scram because the trip occurred during a planned down power evolution.  After prodding by the NRC inspectors, ANO reclassified this event as an unplanned scram. (pp. 31-32)

Overall, the NRC felt it was driving ANO to perform complete evaluations and develop effective corrective actions.  NRC believes that ANO’s “cause evaluations typically don't provide for a thorough assessment of organizational and programmatic contributors to events or issues.” (p. 35)  Later, in response to a question, the Regional Administrator said “I think the licensee clearly needs to own the performance gaps, ensure that their assessments in the various areas are comprehensive and then identify appropriate actions, and then engage and ensure those actions are effective. . . . I don't want to be in a position where our inspection activities are the means for identifying the performance gaps.” (p. 44)

Responding to a question about ANO’s independent SC assessment, he said “one of the key findings . . . was that there's an urgent need to internalize and communicate the seriousness of performance problems and engage the site in their strategy for improvement.” (p. 45)

Entergy Presentation

A team of Entergy (ANO’s owner) senior managers presented their action plan for ANO.  They said they would own their own problems, improve contractor oversight, identify their own issues, increase corporate oversight and improve their CAP.

With respect to culture, they said “We're going to change the culture to promote a healthy, continuous improvement and to not only achieve, but also to sustain excellence.” (pp. 70-71)  They are benchmarking other plants, analyzing ANO’s issues and adding resources including people with plant performance recovery experience. 

They took comfort from the SC assessment conclusion “That although weaknesses exist, the overall safety culture at ANO is sufficient to support safe operation." (p. 72)

In response to a question about important takeaways from the SC assessment, Entergy referred to the need for the plant to recognize that performance has got to improve, the CAP must be more effective and organizational programmatic elements are important.  In addition, they vowed to align the organization on the performance gaps (and their significance) and establish a sense of urgency in order to fix them. (pp. 80-81)

Our Perspective

Not to be too cynical, but what else could Entergy say?  When your plant is in column 4, a mega mea culpa is absolutely necessary.  But Entergy’s testimony read like generic management arm-waving invoking the usual set of fixes.

Basically, the ANO culture endorses a “blame the contractor” attitude, accepts incomplete investigations into actual events and potential problems, and is content to let the NRC point out problems for them.  Where did those values come from?  Is “increased oversight” sufficient to create a long-term fix?

ANO naturally gives a lot of weight to the SC assessment because its findings appear relatively simple and apparently actionable.   Somewhat surprisingly, the NRC also appears to give this assessment broad credibility.  We think that’s misplaced.  The chances are slim of such an assessment identifying deep, systemic cultural issues although we admit we don’t know the assessment details.  Did the assessment team perform document reviews, conduct focus groups or interviews?  If it was a survey, it only identified the most pressing issues in the plant’s safety climate.

Taking a more systemic view, we note that Entergy has a history of SC issues over many plants in its fleet.  Check out our Feb. 20, 2015 post for highlights on some of their problems.  Are ANO’s problems just the latest round of SC Whac-A-Mole at Entergy?

Entergy has always had a strong Operations focus at its plants.  The NRC’s confidence in ANO’s operators is the main reason that plant is not shut down.  But continuously glorifying the operators, particularly their ability to respond successfully to challenging conditions, is like honoring firefighters while ignoring the fire marshal.  The fire marshal role at a nuclear plant is played by Engineering and Maintenance, groups whose success is hidden (thus under-appreciated) in an ongoing series of dynamic, non-events, viz., continuous safe plant operation.  That’s a cultural issue.  By the way, who gets the lion’s share of praise and highest status at your plant?


*  “Briefing on Results of the Agency Action Review Plan Meeting,” public meeting transcript (May 21, 2015).  ADAMS ML15147A041.

The Agency Action Review Meeting (AARM) “is a meeting of the senior leadership of the agency, and its goals are to review the appropriateness of agency actions taken for reactor material licensees with significant performance issues.” (pp. 3-4)

Tuesday, June 9, 2015

Training....Yet Again

U.S. Navy SEALS in Training
We have beat the drum on the value of improved and innovative training techniques for improving safety management performance for some time.  Really since the inception of this blog where our paper, “Practicing Nuclear Safety Management,”* was one of the seminal perspectives we wanted to bring to our readers.  We continue to encounter knowledgeable sources that advocate practice-based approaches and so continue to bring them to our readers’ attention.  The latest is an article from the Harvard Business Review that calls attention to, and distinguishes, “training” as an essential dimension of organizational learning.  The article is “How the Navy SEALS Train for Leadership Excellence.”**  The author, Michael Schrage,*** is a research fellow at MIT who reached out to a former SEAL, Brandon Webb, who transformed SEAL training.  The author contends that training, as opposed to just education or knowledge, is necessary to promote deep understanding of a business or market or process.  Training in this sense refers to actually performing and practicing necessary skills.  It is the key to achieving high levels of performance in complex environments. 

One of Webb’s themes that really struck a chord was: “successful training must be dynamic, open and innovative…. ‘It’s every teacher’s job to be rigorous about constantly being open to new ideas and innovation’, Webb asserts.”  It is very hard to think about much of the training in the nuclear industry on safety culture and related issues as meeting these criteria.  Even the auto industry has recently stepped up to require the conduct of decision simulations to verify the effectiveness of corrective actions - in the wake of the ignition switch-related accidents. (see our
May 22, 2014 post.)

In particular the reluctance of the nuclear industry and its regulator to address the presence and impact of goal conflicts on safety continues to perplex us and, we hope, many others in the industry.   It was on the mind of Carlo Rusconi more than a year ago when he observed: “Some of these conflicts originate high in the organization and are not really amenable to training per se” (see our
Jan. 9, 2014 post.)  However a certain type of training could be very effective in neutralizing such conflicts - practicing making safety decisions against realistic fact-based scenarios.  As we have advocated on many occasions, this process would actualize safety culture principles in the context of real operational situations.  For the reasons cited by Rusconi it builds teamwork and develops shared viewpoints.  If, as we have also advocated, both operational managers and senior managers participated in such training, senior management would be on the record for its assessment of the scenarios including how they weighed, incorporated and assessed conflicting goals in their decisions.  This could have the salutary effect of empowering lower level managers to make tough calls where assuring safety has real impacts on other organizational priorities.  Perhaps senior management would prefer to simply preach goals and principles, and leave the tough balancing that is necessary to implement the goals to their management chain.  If decisions become shaded in the “wrong” direction but there are no bad outcomes, senior management looks good.  But if there is a bad outcome, lower level managers can be blamed, more “training” prescribed, and senior management can reiterate its “safety is the first priority” mantra.


*  In the paper we quote from an article that highlighted the weakness of “Most experts made things worse.  Those managers who did well gathered information before acting, thought in terms of complex-systems interactions instead of simple linear cause and effect, reviewed their progress, looked for unanticipated consequences, and corrected course often. Those who did badly relied on a fixed theoretical approach, did not correct course and blamed others when things went wrong.”  Wall Street Journal, Oct. 22, 2005, p. 10 regarding Dietrich Dörner’s book, The Logic of Failure.  For a comprehensive review of the practice of nuclear safety, see our paper “Practicing Nuclear Safety Management”, March 2008.

**  M. Schrage, "How the Navy SEALS Train for Leadership Excellence," Harvard Business Review (May 28, 2015).

***  Michael Schrage, a research fellow at MIT Sloan School’s Center for Digital Business, is the author of the book Serious Play among others.  Serious Play refers to experiments with models, prototypes, and simulations.

Friday, June 5, 2015

NRC Staff Review of National Research Council Safety Culture Recommendations Arising from Fukushima

On July 30, 2014 we reviewed the safety culture (SC) aspects of the National Research Council report on lessons learned from the Fukushima nuclear accident.  We said the report’s SC recommendations were pretty limited: the NRC and industry must maintain and monitor a strong SC in all safety-related activities, the NRC must maintain its independence from outside influences, and the NRC and industry should increase their transparency about their SC-related efforts.

The NRC staff reviewed the report’s recommendations, assessed whether the agency was addressing them and documented their results.*  Given the low bar, it’s no surprise the staff concluded “that all NAS’s recommendations are being adequately addressed.” (p.1)  Following is the evidence the staff assembled to show the NRC is addressing the SC recommendations.

Emphasis on Safety Culture (pp. 25-26) 


In 1989, after Peach Bottom plant operators were caught sleeping on the job, the NRC issued a “Policy Statement on the Conduct of Nuclear Power Plant Operations.”   The policy statement focused on personal dedication and accountability but also underscored management’s responsibility for fostering a healthy SC.

In 1996, after Millstone whistleblowers faced retaliation, the NRC issued another policy statement, “Freedom of Employees in the Nuclear Industry to Raise Safety Concerns without Fear of Retaliation.”  This policy statement focused on the NRC’s expectation that all licensees will establish and maintain a safety-conscious work environment (SCWE).

In 2002, after discovery of the Davis-Besse reactor pressure vessel’s degradation, the Reactor Oversight Process (ROP) was strengthened to detect potential SC weaknesses during inspections and performance assessments.  ROP changes were described in Regulatory Issue
Summary 2006-13, “Information on the Changes Made to the Reactor Oversight Process to More Fully Address Safety Culture.”

In 2004, INPO published “Principles for a Strong Nuclear Safety Culture.”  In 2009, an industry/NEI/INPO effort produced a process for monitoring and improving SC, documented in NEI 09-07 “Fostering a Strong Nuclear Safety Culture.”  We reviewed NEI 09-07 on Jan. 6, 2011.

In 2008, the NRC initiated an effort to define and expand SC policy.  The final Safety Culture Policy Statement (SCPS) was published on June 14, 2011.  We posted eight times on the SCPS effort before the policy was issued.  Click on the SC Policy Statement label to see both those posts and subsequent ones that refer to the SCPS. 

An Independent Regulator (pp. 26-27)

The Energy Reorganization Act of 1974 established the NRC.  Principal Congressional oversight of the agency is performed by the Senate Subcommittee on Clean Air and Nuclear Safety, and the House Subcommittee on Energy and the Environment.  It’s not clear how the NRC performing obeisance before these committees contributes to the agency’s independence.

The NRC receives independent oversight from the NRC’s Office of the Inspector General and the U.S. Government Accountability Office.

Perhaps most relevant, the U.S. is a contracting party to the international Convention on Nuclear Safety.  The NRC prepares a periodic report describing how the U.S. fulfills its obligations under the CNS, including maintaining the independence of the regulatory body.  On March 26, 2014 we posted on the NRC’s most recent report.

Industry Transparency (pp. 27-28)

For starters, the NRC touts its SC website which includes the SCPS and SC-related educational and outreach materials.

In March 2014, the NRC published NUREG-2165, “Safety Culture Common Language,” which
documents a common language to describe SC in the nuclear industry.  We reviewed the NUREG on April 6, 2014.

That’s all.

Our Perspective 


We’ll give the NRC a passing grade on its emphasis on SC.  The “evidence” on agency independence is slim.  Some folks believe that regulatory capture has occurred, to a greater or lesser degree.  For what it’s worth, we think the agency is fairly independent.

The support for industry transparency is a joke.  As we said in our July 30, 2014 post, “the nuclear industry’s penchant for secrecy is a major contributor to the industry being its own worst enemy in the court of public opinion.”     


NRC Staff Review of National Academy of Sciences Report, “Lessons Learned from theFukushima Dai-ichi Nuclear Accident for Improving Safety of U.S. Nuclear Plants” (Apr. 9, 2015).  ADAMS ML15069A600.  The National Research Council is part of the National Academy of Sciences.

Tuesday, May 26, 2015

Safety Culture “State of the Art” in 2002 per NUREG-1756

Here’s a trip down memory lane.  Back in 2002 a report* on the “state of the art” in safety culture (SC) thinking, research and regulation was prepared for the NRC Advisory Committee on Reactor Safeguards.  This post looks at some of the major observations of the 2002 report and compares them with what we believe is important today.

The report’s Abstract provides a clear summary of the report’s perspective:  “There is a widespread belief that safety culture is an important contributor to the safety of operations. . . . The commonly accepted attributes of safety culture include good organizational communication, good organizational learning, and senior management commitment to safety. . . . The role of regulatory bodies in fostering strong safety cultures remains unclear, and additional work is required to define the essential attributes of safety culture and to identify reliable performance indicators.” (p. iii) 

General Observations on Safety Performance 


A couple of quotes included in the report reflect views on how safety performance is managed or influenced.

 “"The traditional approach to safety . . . has been retrospective, built on precedents. Because it is necessary, it is easy to think it is sufficient.  It involves, first, a search for the primary (or "root") cause of a specific accident, a decision on whether the cause was an unsafe act or an unsafe condition, and finally the supposed prevention of a recurrence by devising a regulation if an unsafe act,** or a technical solution if an unsafe condition." . . . [This approach] has serious shortcomings.  Specifically, ". . . resources are diverted to prevent the accident that has happened rather than the one most likely to happen."” (p. 24)

“"There has been little direct research on the organizational factors that make for a good safety culture. However, there is an extensive literature if we make the indirect assumption that a relatively low accident plant must have a relatively good safety culture." The proponents of safety culture as a determinant of operational safety in the nuclear power industry rely, at least to some degree, on that indirect assumption.” (p. 37) 

Plenty of people today behave in accordance with the first observation and believe (or act as if they believe) the second one.  Both contribute to the nuclear industry’s unwillingness to consider new ways of thinking about how safe performance actually occurs.

Decision Making, Goal Conflict and the Reward System

Decision making processes, recognition of goal conflicts and an organization’s reward system are important aspects of SC and the report addressed them to varying degrees.

One author referenced had a contemporary view of decision making, noting that “in complex and ill-structured risk situations, decisionmakers are faced not only with the matter of risk, but also with fundamental uncertainty characterized by incompleteness of knowledge.” (p. 43)  That’s true in great tragedies like Fukushima and lesser unfortunate outcomes like the San Onofre steam generators.

Goal conflict was mentioned: “Managers should take opportunities to show that they will put safety concerns ahead of power production if circumstances warrant.” (p.7)

Rewards should promote good safety practices (p. 6) and be provided for identifying safety issues. (p. 37)  However, there is no mention of the executive compensation system.  As we have argued ad nauseam these systems often pay more for production than for safety.

The Role of the Regulator


“The regulatory dilemma is that the elements that are important to safety culture are difficult, if not impossible, to separate from the management of the organization.  [However,] historically, the NRC has been reluctant to regulate management functions in any direct way.” (pp. 37-38)  “Rather, the NRC " . . . infers licensee organization management performance based on a comprehensive review of inspection findings, licensee amendments, event reports, enforcement history, and performance indicators."” (p. 41)  From this starting point, we now have the current situation where the NRC has promulgated its SC Policy Statement and practices de facto SC regulation using the highly reliable “bring me another rock” method.

The Importance of Context when Errors Occur 


There are hints of modern thinking in the report.  It contains an extended summary of Reason’s work in Human Error.  The role of latent conditions, human error as consequence instead of cause, the obvious interaction between producers and production, and the “non-event” of safe operations are all mentioned. (p. 15)  However, a “just culture” or other more nuanced views of the context in which safety performance occurs had yet to be developed.

One author cited described “the paradox that culture can act simultaneously as a precondition for safe operations and an incubator for hazards.” (p. 43)  We see that in Reason and also in Hollnagel and Dekker: people going about business as usual with usually successful results but, on some occasions, with unfortunate outcomes.

Our Perspective

The report’s author provided a good logic model for getting from SC attributes to identifying useful risk metrics, i.e., from SC to one or more probabilistic risk assessment (PRA) parameters.  (pp. 18-20)  But none of the research reviewed completed all the steps in the model. (p. 36)  He concludes “What is not clear is the mechanism by which attitudes, or safety culture, affect the safety of operations.” (p. 43)  We are still talking about that mechanism today.   

But some things have changed.  For example, probabilistic thinking has achieved greater penetration and is no longer the sole province of the PRA types.  It’s accepted that Black Swans can occur (but not at our plant).

Bottom line: Every student of SC should take a look at this.  It includes a good survey of 20th century SC-related research in the nuclear industry and it’s part of our basic history.

“Those who cannot remember the past are condemned to repeat it.” — George Santayana (1863-1952)


*  J.N. Sorensen, “Safety Culture: A Survey of the State-of-the-Art,” NUREG-1756 (Jan. 2002).  ADAMS ML020520006.  (Disclosure: I worked alongside the author on a major nuclear power plant litigation project in the 1980s.  He was thoughtful and thorough, qualities that are apparent in this report.)

**  We would add “or reinforcing an existing regulation through stronger procedures, training or oversight.”

Monday, April 27, 2015

INPO’s View on Fukushima Safety Culture Lessons Learned

In November 2011 the Institute of Nuclear Power Operations (INPO) published a special report* on the March 2011 Fukushima accident.  The report provided an overview and timeline for the accident, focusing on the evolution of the situation during the first several days after the earthquake and tsunami.  Safety culture (SC) was not mentioned in the report.

In August 2012 INPO issued an addendum** to the report covering Fukushima lessons learned in eight areas, including SC.  Each area contains a lengthy discussion of relevant plant activities and experiences, followed by specific lessons learned.  According to INPO, some lessons learned may be new or different from those published elsewhere.  Several caught our attention as we paged through the addendum: Invest resources to assess low-probability, high-consequence events (Black Swans).  Beef up available plant staffing to support regular staff in case a severe, long duration event inconveniently occurs on a weekend.  Evaluate the robustness of off-site event management facilities (TEPCO’s was inaccessible, lost power and did not have filtered ventilation).  Be aware that assigning most decision making authority to the control room crew (as TEPCO did) meant other plant groups could not challenge or check ops’ decisions—efficiency at the cost of thoroughness.  Conduct additional training for a high-dose environment when normal dosage limits are replaced with emergency ones.  Ensure that key personnel have in-depth reactor and power plant knowledge to respond effectively if situations evolve beyond established procedures and flexibility is required.

Focusing on SC, the introduction to this section is clear and unexpectedly strong: “History has shown that accidents and their precursors at commercial nuclear electric generating stations result from a series of decisions and actions that reflect flaws in the shared assumptions, values, and beliefs of the operating organization.” (p. 33)

The SC lessons learned are helpful.  INPO observed that while TEPCO had taken several steps over the years to strengthen its SC, it missed big picture issues including cultivating a questioning attitude, challenging assumptions, practicing safety-first decision making and promoting organizational learning.  In each of these areas, the report covers specific deficiencies or challenges faced at Fukushima followed by questions aimed at readers asking them to consider if similar conditions exist or could exist at their own facilities.

Our Perspective

The addendum has a significant scope limitation: it does not address public policy (e.g., regulatory or governmental) factors that contributed to the Fukushima accident and yielded their own lessons learned.***  However, given the specified scope, a quick read of the entire addendum suggests it’s reasonably thorough, the SC section certainly is.  The questions aimed at report readers are the kind we ask all the time on Safetymatters but we award INPO full marks for addressing these general, qualitative, open-ended subjects.  One question INPO raised that we have not specifically asked is “To what extent are the safety implications considered during enterprise business planning and budgeting?” (italics added)  Another, inferred from the report text, is “How do operators create complex, realistic scenarios (e.g., with insufficient information and/or personnel under stress) during emergency training?”  These are legitimate additions to the repertoire.  

The addendum is not perfect.  For example, INPO trots out the “special and unique” mantra when discussing the essential requirements to maintain core cooling capability and containment integrity (esp. with respect to venting at Fukushima).  This mantra, coupled with INPO’s usual penchant for secrecy, undermines public support for commercial nuclear power.  INPO can be a force for good when its work products, like this report and addendum, are publicly available.  It would be better for the industry if INPO were more transparent and if commercial nuclear power were characterized as a safety-intense industrial process run by ordinary, albeit highly trained, people.

Bottom line, you should read the addendum looking for bits that apply to your own situation.


*  INPO, “Special Report on the Nuclear Accident at the Fukushima Daiichi Nuclear Power Station,” INPO 11-005 Rev. 0 (Nov. 2011).

**  INPO, “Lessons Learned from the Nuclear Accident at the Fukushima Daiichi Nuclear Power Station,” INPO 11-005 Rev. 0 Addendum (Aug. 2012).  Thanks to Madalina Tronea for publicizing this document.  Dr. Tronea is the founder/moderator of the LinkedIn Nuclear Safety discussion group.

***  Regulatory, government and corporate governance lessons learned have been publicized by other Fukushima reviewers and the findings widely distributed, including on Safetymatters.  Click on the Fukushima label to see our related posts. 

Wednesday, April 22, 2015

More Evidence of Weak Safety Culture in DOE

DNFSB Headquarters
We have posted many times about safety culture (SC) issues in the Department of Energy (DOE) empire.  Many of those issues have been raised by the Defense Nuclear Facilities Safety Board (DNFSB), an overseer of DOE activities.  Following is a recent example based on a DNFSB staff report.*

The Radcalc Imbroglio

Radcalc is a computer program used across the DOE complex (and beyond) to determine the transportation package classification for radioactive materials, including radioactive waste, based on the isotopic content.  Radcalc errors could lead to serious consequences, e.g., exposure to radiation or explosions, in the event of a transportation accident.  DOE classified Radcalc as safety software and assigned it the second highest level of rigor in DOE’s software quality assurance (SQA) procedures.

A DNFSB audit found multiple deficiencies with respect to Radcalc, most prominently DOE’s inability to provide any evidence of federal oversight of Radcalc during the software's lifetime (which dates back to the mid-1990s).  In addition, there was no evidence DOE contractors had any Radcalc-related QA plans or programs, or maintained software configuration management.  Neither DOE nor the contractors effectively used their corrective action program to identify and correct software problems.  DNFSB identified other problems but you get the idea.

DNFSB Analysis

As part of its analysis of problems and causes, the DNFSB identified multiple contributing factors including the following related to organization.  “There is an apparent lack of a systematic, structured, and documented approach to determine which organization within DOE is responsible to perform QA audits of contractor organizations.  During the review, different organizations within DOE stated that they thought another organization was responsible for performing Radcalc contractor QA audits.  DOE procedures do not clearly delineate which organization is responsible for QA/SQA audits and assessments.” (Report, p. 4)

Later, the report says “In addition, this review identified potentially significant systemic [emphasis added] concerns that could affect other safety software. These are: inadequate QA/SQA requirement specification in DOE contracts and the lack of policy identifying the DOE organizations in charge of performing QA assessments to ensure compliance; unqualified and/or inadequate numbers of qualified federal personnel to oversee contract work; . . . and additional instances of inadequate oversight of computer work within DOE (e.g., Radtran).” (Report, p. 5)

Our Perspective

Even without the DNFSB pointing out “systemic” concerns, this report practically shouts the question “What kind of SC would let this happen?”  We are talking about a large group of organizations where a significant, safety-related activity failed to take place and the primary reason (excuse) is “Not my group’s job.”  And no one took on the task to determine whose job it was.  This underlying cultural attitude could be as significant as the highly publicized SC problems at individual DOE facilities, e.g., the Hanford Waste Treatment Plant or the Waste Isolation Pilot Plant.

The DNFSB asked DOE to respond to the report within 90 days.  What will such a report say?  Let’s go out on a limb here and predict the report will call for “improved procedures, training and oversight.”  The probability of anyone facing discipline over this lapse: zero.  The probability of DOE investigating its own and/or contractor cultures for a possible systemic weakness: also zero.  Why?  Because there’s no money in it for DOE or the contractors and the DNFSB doesn’t have the organizational or moral authority to force it to happen.

We’ve always championed the DNFSB as the good guys, trying to do the right thing with few resources.  But the sad reality is they are a largely invisible backroom bureaucracy.  When a refinery catches fire, the Chemical Safety Board is front and center explaining what happened and what they’ll recommend to keep it from happening again.  When was the last time you saw the DNFSB on the news or testifying before Congress?  Their former chairman retired suddenly late last year, with zero fanfare; we think it’s highly likely the SC initiative he championed and attempted to promulgate throughout DOE went out the door with him.


*  J.H. Roberson (DNFSB) to D.M. Klaus (DOE), letter (Mar. 16, 2015) with enclosed Staff Issue Report “Review of Federal Oversight of Software Quality Assurance for Radcalc” (Dec. 17, 2014).  Thanks to Bill Mullins for bringing this document to our attention.

Monday, April 13, 2015

Safety-I and Safety-II: The Past and Future of Safety Management by Erik Hollnagel

This book* discusses two different ways of conceptualizing safety performance problems (e.g., near-misses, incidents and accidents) and safety management in socio-technical systems.  This post describes each approach and provides our perspective on Hollnagel’s efforts.  As usual, our interest lies in the potential value new ways of thinking can offer to the nuclear industry.

Safety-I

This is the common way of looking at safety performance problems.  It is reactive, i.e., it waits for problems to arise** and analytic, e.g., it uses specific methods to work back from the problem to its root causes.  The key assumption is that something in the system has failed or malfunctioned and the purpose of an investigation is to identify the causes and correct them so the problem will not recur.  A second assumption is that chains of causes and effects are linear, i.e., it is actually feasible to start with a problem and work back to its causes.  A third assumption is that a single solution (the “first story”) can be found. (pp. 86, 175-76)***  Underlying biases include the hindsight bias (p. 176) and the belief that the human is usually the weak link. (pp. 78-79)  The focus of safety management is minimizing the number of things that go wrong.

Our treatment of Safety-I is brief because we have reported on criticism of linear thinking/models elsewhere, primarily in the work of Dekker, Woods et al, and Leveson.  See our posts of Dec. 5, 2012; July 6, 2013; and Nov. 11, 2013 for details.

Safety-II

Safety-II is proposed as a different way to look at safety performance.  It is proactive, i.e., it looks at the ways work is actually performed on a day-to-day basis and tries to identify causes of performance variability and then manage them.  A key cause of variability is the regular adjustments people make in performing their jobs in order to keep the system running.  In Hollnagel’s view, “Finding out what these [performance] adjustments are and trying to learn from them can be more important than finding the causes of infrequent adverse outcomes!” (p. 149)  The focus of safety management is on increasing the likelihood that things will go right and developing “the ability to succeed under varying conditions, . . .” (p. 137).

Performance is variable because, among other reasons, people are always making trade-offs between thoroughness and efficiency.  They may use heuristics or have to compensate for something that is missing or take some steps today to avoid future problems.  The underlying assumption of Safety-II is that the same behaviors that almost always lead to successful outcomes can occasionally lead to problems because of performance variability that goes beyond the boundary of the control space.  A second assumption is that chains of causes and effects may be non-linear, i.e., a small variance may lead to a large problem, and may have an emergent aspect where a specific performance variability may occur then disappear or the Swiss cheese holes may momentarily line up exposing the system to latent hazards. (pp. 66, 131-32)  There may be multiple explanations (“second stories”) for why a particular problem occurred.  Finally, Safety-II accepts that there are often differences between Work-as-Imagined (esp. as imagined by folks at the blunt end) and Work-as-Done (by people at the sharp end). (pp. 40-41)***

The Two Approaches

Safety-I and Safety-II are not in some winner-take-all competitive struggle.  Hollnagel notes there are plenty of problems for which a Safety-I investigation is appropriate and adequate. (pp. 141, 146)

Safety-I expenditures are viewed as a cost (to reduce errors). (p. 57)  In contrast, Safety-II expenditures are viewed as bona fide investments to create more correct outcomes. (p. 166)

In all cases, organizational factors, such as safety culture, can impact safety performance and organizational learning. (p. 31)

Our Perspective

The more complex a socio-technical entity is, the more it exhibits emergent properties and the more appropriate Safety-II thinking is.  And nuclear has some elements of complexity.****  In addition, Hollnagel notes that a common explanation for failures that occur in a System-I world is “it was never imagined something like that could happen.” (p. 172)  To avoid being the one in front of the cameras saying that, it might be helpful for you to spend a little time reflecting on how System-II thinking might apply in your world.

Why do most things go right?  Is it due to strict compliance with procedures?  Does personal creativity or insight contribute to successful plant performance?  Do you talk with your colleagues about possible efficiency-thoroughness trade-offs (short cuts) that you or others make?  Can thinking about why things go right make one more alert to situations where things are heading south?  Does more automation (intended to reduce reliance on fallible humans) actually move performance closer to the control boundary because it removes the human’s ability to make useful adjustments?  Has any of your root cause evaluations appeared to miss other plausible explanations for why a problem occurred?

Some of the Safety-II material is not new.  Performance variability in Safety-II builds on Hollnagel’s earlier work on the efficiency-thoroughness trade-off (ETTO) principle.  (See our Jan. 3, 2013 post.)   His call for mindfulness and constant alertness to problems is straight out of the High Reliability Organization playbook. (pp. 36, 163-64)  (See our May 3, 2013 post.)

A definite shortcoming is the lack of concrete examples in the Safety-II discussion.  If someone has tried to do this, it would be nice to hear about it.

Bottom line, Hollnagel has some interesting observations although his Safety-II model is probably not the Next Big Thing for nuclear safety management.

 

*  E. Hollnagel, Safety-I and Safety-II: The Past and Future of Safety Management  (Burlington, VT: Ashgate , 2014)

**  In the author’s view, forward-looking risk analysis is not proactive because it is infrequently performed. (p. 57) 

***  There are other assumptions in the Safety-I approach (see pp. 97-104) but for the sake of efficiency, they are omitted from this post.

****  Nuclear power plants have some aspects of a complex socio-technical system but other aspects are merely complicated.   On the operations side, activities are tightly coupled (one attribute of complexity) but most of the internal organizational workings are complicated.  The lack of sudden environmental disrupters (excepting natural disasters) means they have time to adapt to changes in their financial or regulatory environment, reducing complexity.