Thursday, August 13, 2009

Primer on System Dynamics

System Dynamics is a concept for seeing the world in terms of inputs and outputs, where internal feedback loops and time delays can affect system behavior and lead to complex, non-linear changes in system performance.

The System Dynamics worldview was originally developed by Prof. Jay Forrester at MIT. Later work by other thinkers, e.g., Peter Senge, author of The Fifth Discipline, expanded the original concepts and made them available to a broader audience. An overview of System Dynamics can be found on Wikipedia.

Our NuclearSafetySim program uses System Dynamics to model managerial behavior in an environment where maintaining the nuclear safety culture is a critical element. NuclearSafetySim is built using isee Systems iThink software. isee Systems has educational materials available on their website that explain some basic concepts.

There are other vendors in the System Dynamics software space, including Ventana Systems and their Vensim program. They also provide some reference materials, available here.

Thursday, August 6, 2009

Signs of a Reactive Organization (MIT #6)

One of the most important insights to be gained from a systems perspective of safety management is the effectiveness of various responses to changes in system conditions.  Recall that in our post #3 on the MIT paper, we talked about single versus double loop learning.  Single loop response included short term, local responses to perceived problems while double loop referred to gaining an understanding of the underlying reasons for the problems and finding long term solutions.  As you might guess, single loop responses tend to be reactive.  “An oscillating incident rate is the hallmark of a reactive organization, where successive crises lead to short term fixes that persist only until the next crisis.” [pg 22]  We can use our NuclearSafetySim model to illustrate differing approaches to managing problems.

The figure below illustrates how the number of problems/issues (we use the generic term "challenges" in NuclearSafetySim) might vary with time when the response is reactive.  The blue line indicates the total number of issues, the pink line the number of new issues being identified and the green line, the resolution rate for issues, e.g., through a corrective action program.  Note that the blue line initially increases and then oscillates while the pink line is relatively constant.  The oscillation derives from the management response, reflected in the green line, where there is an initial delay in responding to an increased numbers of issues, then resolution rates are greatly increased to address higher backlogs, then reduced (due to budgetary pressures and other priorities) when backlogs start to fall, precipitating another cycle of increasing issues.




Compare the oscillatory response above to the next figure where an increase in issues results immediately in higher resolution rates that are maintained over a period sufficient to return the system to a lower level of backlogs.  In parallel, budgets are increased to address the underlying causes of issues, driving down the occurrence rate of new issues and ultimately bringing backlog down to a long-term sustainable level.



The last figure shows some of the ramifications of system management on safety culture and employee trust.  The significant increase in issues backlog initially leads to a degradation of employee trust (the pink line) and an erosion in safety culture (blue line).  However the nature and effectiveness of the management response in bringing down backlogs and reducing new issues reverses the trust trend line and rebuilds safety culture over time.  Note the red line, representing plant performance, is relatively unchanged over the same period indicating that performance issues may exist under the cover of a consistently operating plant.

Tuesday, August 4, 2009

The Economist on Computer Simulation

The Economist has occasional articles on the practical applications of computer simulation. Following are a couple of items that have appeared in the last year.

Agent-based simulation is used to model the behavior of crowds. "Agent-based" means that each individual has some capacity to ascertain what is going on in the environment and act accordingly. This approach is being used to simulate the movement of people in a railroad station or during a building fire. On a much larger scale, each of the computer-generated orcs in the "Lord of the Rings" battle scenes moved independently based on his immediate surroundings.

Link to
article.

The second article is a brief review of simulation's use in business applications, including large-scale systems (e.g., an airline), financial planning, forecasting, process mapping and Monte Carlo analysis. This is a quick read on the ways simulation is used to illustrate and analyze a variety of complex situations.

Link to
article
.

Other informational resources that discuss simulation are included on our
References
page.

Monday, August 3, 2009

Reading List: Just Culture by Sidney Dekker

Thought I would share with you a relatively recent addition to the safety management system bookshelf, Just Culture by Sidney Dekker, Professor of Human Factors and System Safety at Lund University in Sweden.  In Dekker’s view a “just culture” is critical for the creation of safety culture.  A just culture will not simply assign blame in response to a failure or problem, it will seek to use accountability as a means to understand the system-based contributors to failure and resolve those in a manner that will avoid recurrence.  One of the reasons we believe so strongly in safety simulation is the emphasis on system-based understanding, including a shared organizational mental model of how safety management happens.  One reviewer (D. Sillars) of this book on the amazon.com website summarizes, “’Just culture’ is an abstract phrase, which in practice, means . . . getting to an account of failure that can both satisfy demands for accountability while contributing to learning and improvement.” 


Question for nuclear professionals:  Does your organization maintain a library of resources such as Just Culture or Dianne Vaughan’s book, The Challenger Launch Decision, that provide deep insights into organizational performance and culture?  Are materials like this routinely the subject of discussions in training sessions and topical meetings?

Thursday, July 30, 2009

“Reliability is a Dynamic Non-Event” (MIT #5)

What is this all about?  Reliability is a dynamic non-event [MIT paper pg 5].  It is about complacency.  Paradoxically, when incident rates are low for an extended period of time and if management does not maintain a high priority on safety, the organization may slip into complacency as individuals shift their attention to other priorities such as production pressures.  The MIT authors note the parallel to the NASA space program where incidents were rare notwithstanding a weak safety culture, resulting in the organization rationalizing its performance as “normal”.  (See Dianne Vaughan’s book The Challenger Launch Decision for a compelling account of NASA’s organizational dynamics.)  In our paper “Practicing Nuclear Safety Management” we make a similar comparison.

What does this imply about the nuclear industry?  Certainly we are in a period where the reliability of the plants is at a very high level and the NRC ROP indicator board is very green.  Is this positive for maintaining high safety culture levels or does it represent a potential threat?  It could be the latter since the biggest problem in addressing the safety implications of complacency in an organization is, well, complacency.

Wednesday, July 29, 2009

Self Preservation (MIT #4)

The MIT paper [pg 7] introduces the concept of feedback loops, an essential ingredient of systems dynamics, and critical to understanding the dynamics of safety management.  The MIT authors suggest that there is a “weak balancing loop” associated with individuals responding to a perceived personal threat associated with increased incident rates.  While the authors acknowledge it is a weak feedback, I would add that, at best, it represents an idealized effect and is hard to differentiate from other feedbacks that individuals receive such as management reaction to incidents and pressures associated with cost and plant performance.  The MIT paper [pg 8] goes on to address management actions and states, “When faced with an incident rate that is too high, the natural and most immediately effective response for managers is to focus the blame on individual compliance with rules.”  Note the conditional phrase, “most immediately effective” as it is an example of single loop learning as described in one of my prior posts (MIT #3).  Certainly the fact that procedure adherence is an issue that recurs at many nuclear plants suggests that the “blame game” has limited and short term effectiveness.

My sense is that the self preservation effect is one that exists deeply embedded within the larger safety climate of the organization.  In that climate how strictly is rule adherence observed?  Are procedures and processes of sufficient quality to enhance observance?  If procedures and processes are ambiguous or even incorrect, and left uncorrected, is there a tacit approval of alternate methods?  The reality is self preservation can act in several directions – it may impel compliance, if that is truly the organizational ethic, or it could rationalize non-compliance if that is an organizational expectation.  Life is difficult.

"Beaten to Death by Croutons"

In the July 27, 2009 Wall Street Journal in the Bookshelf column, there is a review of "Say Everything", a book about blogging.  In the review, there is a comment that "reading blogs is like being beaten to death by croutons".  We hope that readers of our blog do not experience such a fate.  The column goes on to note that the best blogs are those that are concise, current, and precisely targeted.  That is the goal for this blog and we hope it is being met.