Monday, September 12, 2011

Understanding the Risks in Managing Risks

Our recent blog posts have discussed the work of anthropologist Constance Perin.  This post looks at her book, Shouldering Risks: The Culture of Control in the Nuclear Power Industry.*  The book presents four lengthy case studies of incidents at three nuclear power plants and Perin’s analysis which aims to explain the cultural attributes that facilitated the incidents’ occurrence or their unfavorable evolution.

Because they fit nicely with our interest in decision-making, this post will focus on the two case studies that concerned hardware issues.**  The first case involved a leaking, unisolable valve in the reactor coolant system (RCS) that needed repacking, a routine job.  The mechanics put the valve on its backseat, opened it, observed the packing moving up (indicating that the water pressure was too high or the backseat step hadn't worked), and closed it up.  After management meetings to review the situation, the mechanics tried again, packing came out, and the leak became more serious.  The valve stem and disc had separated, a fact that was belatedly recognized.  The leak was eventually sufficiently controlled so the plant could wait until the next outage to repair/replace the valve.  

The second case involved a switchyard transformer that exhibited a hot spot during a thermography examination.  Managers initially thought they had a circulating current issue, a common problem.  After additional investigations, including people climbing on ladders up alongside the transformer, a cover bolt was removed and the employee saw a glow inside the transformer, the result of a major short.  Transformers can, and have, exploded from such thermal stresses but the plant was able to safely shut down to repair/replace the transformer.

In both cases, there was at least one individual who knew (or strongly suspected) that something more serious was wrong from the get-go but was unable to get the rest of the organization to accept a more serious, i.e., costly, diagnosis.

Why were the plant organizations so willing, even eager, to assume the more conventional explanations for the problems they were seeing?  Perin provides a multidimensional framework that helps answer that question.

The first dimension is the tradeoff quandary, the ubiquitous tension between production and cost, including costs associated with safety.  Plant organizations are expected to be making electricity, at a budgeted cost, and that subtle (or not-so-subtle) pressure colors the discussion of any problem.  There is usually a preference for a problem explanation and corrective action that allows the plant to continue running.

Three control logics constitute a second dimension.  The calculated logics are the theory of how a plant is (or should be) designed, built, and operated.  The real-time logics consist of the knowledge of how things actually work in practice.  Policy logics come from above, and represent generalized guidelines or rules for behavior, including decision-making.  An “answer” that comes from calculated or policy logic will be preferred over one that comes from real-time logic, partly because the former have been developed by higher-status groups and partly because such answers are more defensible to corporate bosses and regulators.

Finally, traditional notions of group and individual status and a key status property, credibility, populate a third dimension: design engineers over operators over system engineers over maintenance over others; managers over individual contributors; old-timers over newcomers.  Perin creates a construct of the various "orders"*** in a plant organization, specialists such as operators or system engineers.  Each order has its own worldview, values and logics – optimum conditions for nurturing organizational silos.  Information and work flows are mediated among different orders via plant-wide programs (themselves products of calculated and policy logics).
 
Application to Cases

The aforementioned considerations can be applied to the two cases.  Because the valve was part of the RCS, it should have been subject to more detailed planning, including additional risk analysis and contingency prep.  This was pointed out by a new-to-his-job work planner who was basically ignored because of his newcomer status.  And before the work was started, the system engineer (SE) observed that this type of valve (which had a problem history at this plant and elsewhere) was prone to valve disk/stem separation and this particular valve appeared to have the problem based on his visual inspection (it had one thread less visible than other similar valves).  But the SE did not make his observations forcefully and/or officially (by initiating a CR) so his (accurate) observation was not factored into the early decision-making.  Ultimately, their concerns did not sway the overall discussion where the schedule was highest priority.  A radiographic examination that would have shown the valve/disc separation was not performed early on because that was an Engineering responsibility and the valve repair was a Maintenance project.

The transformer is on the non-nuclear side of the plant, which makes the attitudes toward it less focused and critical than for safety-related equipment.  The hot spot was discovered by a tech who was working with a couple of thermography consultants.  Thermography was a relatively new technology at this plant and not well-understood by plant managers (or trusted because early applications had given false alarms).  The tech said that the patterns he observed were not typical for circulating currents but neither he nor the consultants (the three people on-site who understood thermography) were in the meetings where the problem was discussed.  The circulating current theory was popular because (a) the plant had experienced such problems in the past and (b) addressing it could be done without shutting down the plant.  Production pressure, the nature of past problems, and the lower status of roles and equipment that are not safety related all acted to suppress the emergent new knowledge of what the problem actually was.  

Lessons Learned

Perin’s analytic constructs are complicated and not light reading.  However, the interviews in the case studies are easy to read and very revealing.  It will come as no surprise to people with consulting backgrounds that the interviewees were capable of significant introspection.  In the harsh light of hindsight, lots of folks can see what should (and could) have happened.  

The big question is what did those organizations learn?  Will they make the same mistakes again?  Probably not.  But will they misinterpret future weak or ambiguous signals of a different nascent problem?  That’s still likely.  “Conventional wisdom” codified in various logics and orders and guided by a production imperative remains a strong force working against the open discussion of alternative explanations for new experiences, especially when problem information is incomplete or fuzzy.  As Bob Cudlin noted in his August 17, 2011 post: [When dealing with risk-imbued issues] “the intrinsic uncertainties in significance determination opens the door to the influence of other factors - namely those ever present considerations of cost, schedule, plant availability, and even more personal interests, such as incentive programs and career advancement.”

   
*  C. Perin, Shouldering Risks: The Culture of Control in the Nuclear Power Industry, (Princeton, NJ: Princeton University Press, 2005).

**  The case studies and Perin’s analysis have been greatly summarized for this blog post.

***  The “orders” include outsiders such as NRC, INPO or corporate overseers.  Although this may not be totally accurate, I picture orders as akin to medieval guilds.

No comments:

Post a Comment

Thanks for your comment. We read them all. We'd like to display them under their respective posts on our main page but that's not how Blogger works.