Showing posts with label Management. Show all posts
Showing posts with label Management. Show all posts

Thursday, November 20, 2014

The McKinsey Quarterly at 50 Years



The Quarterly’s mission is to help define the senior management agenda; this anniversary issue* is focused on McKinsey’s vision for the future of management. (p. 1)  The issue is organized around several themes (strategy, productivity, etc.) but we’re interested in how it addresses culture.  The word appears in several articles, but usually in passing or in a way not readily applied to nuclear safety culture.  There were, however, a few interesting tidbits.  

One article focused on artificial intelligence as a sweeping technological change with exponential impacts on business.  An interviewee opined that current senior management culture based on domain expertise will need to give way to becoming data-driven.  “[D]ata expertise is at least as important [as domain expertise] and will become exponentially more important.  So this is the trick.  Data will tell you what’s really going on, whereas domain expertise will always bias you toward the status quo, and that makes it very hard to keep up with these disruptions.” (p. 73)  Does the culture of the nuclear industry ignore or undervalue disruptions of all types because they may threaten the status quo?

McKinsey’s former managing director listed several keys to corporate longevity, including “creating a culture of dissatisfaction with current performance, however good” and “focus[ing] relentlessly on values . . . A company’s values are judged by actions and behavior, not words and mission statements.” (pp. 121-22)  The first point reinforces the concept of a learning organization; the second the belief that behavior, e.g., the series of decisions made in an organization, is culture-in-action.  Any design for a strong safety culture should consider both.

Lou Gerstner (the man who saved IBM) also had something to say about values in action: “The rewards system is a powerful driver of behavior and therefore culture. Teamwork is hard to cultivate in a world where employees are paid solely on their individual performance.” (p. 126)  We have long argued that executive compensation schemes that pay more for production or cost control than safety send an accurate, although inappropriate, signal of what’s really important throughout the organization.

Finally, management guru Tom Peters had some comments about leadership.  “If you take a leadership job, you do people.  Period.  It’s what you do. It’s what you’re paid to do.  People, period.  Should you have a great strategy?  Yes, you should.  How do you get a great strategy?  By finding the world’s greatest strategist, not by being the world’s greatest strategist.  You do people.  Not my fault.  You chose it.  And if you don’t get off on it, do the world a favor and get the hell out before dawn, preferably without a gilded parachute.  But if you want the gilded parachute, it’s worth it to get rid of you.” (p. 93)  Too simplistic?  Probably, but the point that senior managers have to spend significant time identifying, developing and keeping the most qualified people is well-taken.

Our Perspective

None of this is groundbreaking news.  But in a world awash in technology innovations and “big data” it’s interesting that one of the world’s foremost management consulting practices still sees a major role for culture in management’s future.


*  McKinsey Quarterly, no. 3 (2014).

Friday, June 27, 2014

Reaction to the Valukas Report on GM Ignition Switch Problems

CEO Mary Barra and Anton Valukas
General Motors released the report* by its outside attorney, Anton Valukas, investigating the hows and whys of the failure to recall Chevy Cobalts due to faulty ignition switches.  We blogged on these issues and the choice of Mr. Valukas on May 19, 2014 and May 22, 2014 indicating our concern that his law firm had prior and ongoing ties to GM.  The report is big, 314 pages, and for some reason is marked as “Confidential, Attorney-Client Privileged”.  This is curious for a report always intended to be public and tends to highlight that Valukas and GM are in a proprietary relationship - perhaps not the level of independence one might expect for this type of assessment.

Our take, in brief, is that the Valukas report documents the "hows" but not the "whys" of what happened.  In fact it appears to be a classic legal analysis of facts based on numerous interviews of “witnesses” and reviews of documentation.  It is heavy with citations and detail but it lacks any significant analysis of the events or insight as to why people did or did not do things.  “Culture” is the designated common mode failure.  But there is no exploration of extent of condition or even consideration of why GM’s safety processes failed in the case of the Cobalt but have been effective in many other situations.  Its recommendations for corrective actions by GM are bland, programmatic and process intensive, and lack any demonstrable linkage to being effective in addressing the underlying issues.  On its part GM has accepted the findings, fired 15 low level engineers and promised a new culture.

The response to the report has reflected the inherent limitations and weaknesses of the assessment.  There have been many articles written about the report that provide useful perspectives.  An example is a column in the Wall Street Journal by Holman Jenkins titled “GM’s Cobalt Report Explains Nothing."**  In a nutshell that sums it up pretty well.  It is well worth reading in its entirety.

Congressional response has also been quite skeptical.  On June 18, 2014 the House Committee on Energy and Commerce, Subcommittee on Oversight and Investigations, held a hearing with GM CEO Barra and Valukas testifying.  A C-SPAN video of the proceeding is available and is of some interest.***  Questioning by subcommittee members focused on the systemic nature of the problems at GM, how GM hoped to change an entrenched culture, and the credibility of the findings that malfeasance did not extend higher into the organization.

The Center for Auto Safety, perhaps predictably, was not impressed with the report, stating: “The Valukas Report is clearly flawed in accepting GM’s explanation that its engineers and senior managers did not know stalling was safety related.”****

Why doesn’t the Valukas report explain more?  There are several possibilities.  Mr. Valukas is an attorney.  Nowhere in the report is there a delineation of the team assembled by Mr. Valukas or their credentials. It is not clear if the team included expertise on complex organizations, safety management or culture.  We suspect not.  The Center for Auto Safety asserts that the report is a shield for GM against potential criminal liability.  Impossible for us to say.  Congressional skepticism seemed to reflect a suspicion that the limited scope of the investigation was designed to protect senior GM executives.  Again hard to know but the truncated focus of the report is a significant flaw.

What is clear from these reactions to the report is that, at a minimum, it is ineffective in establishing that a full and expert analysis of GM’s management performance has been achieved.  Assigning fault to the GM culture is at once too vague and ultimately too convenient in avoiding more specific accountability.  It also suggests that internally GM has not come to grips with the fundamental problems in its management system and decision making.  If so, it is hard to believe that the corrective actions being taken will be effective in changing that system or assuring better safety performance going forward.


*  A.R. Valukas, "Report to Board of Directors of General Motors Company Regarding Ignition Switch Recalls" (May 29, 2014).

**  H.W. Jenkins, Jr., "GM's Cobalt Report Explains Nothing," Wall Street Journal (June 6, 2014).

***  C-SPAN, "GM Recall Testimony" (June 18, 2014).  Retrieved June 26, 2014.

****  C.Ditlow (Center for Auto Safety), letter to A.R. Valukas (June 17, 2014), p. 3.  Retrieved June 26, 2014.

Monday, May 19, 2014

GM Part 2

In our April 16, 2014 post we discussed the evolving situation at General Motors regarding the issues with the Chevy Cobalt’s ignition switches.  We highlighted the difficulties GM was encountering in piecing together how decisions were made regarding re-design and possible vehicle recalls, and who in the management chain was involved and/or aware of the issues.  As we noted, GM had initiated an internal investigation of the matter with the results expected by late May.

In a recent Wall Street Journal article* there is some further perspective on how things are moving forward.  For one, the GM Board has now instituted its own investigation of how information flowed to the Board and how it affected its oversight function.  An outside law firm is conducting that investigation.

Perhaps of more interest are some comments in the article regarding the separate investigation being conducted on behalf of GM’s management.  It is being conducted by a former U.S. attorney, Anton Valukas, who also happens to be Chairman of the law firm Jenner & Block.  The WSJ article notes “some governance experts have questioned whether Mr. Valukas has enough of an arm's-length relationship with GM management. Jenner & Block has long advised GM management.”  It does seem to raise a basic conflict of interest issue, providing legal services to GM and conducting an independent investigation.  But a source quoted in the WSJ article notes that GM does not see a problem since “Mr. Valukas' own integrity is on the line…”

In terms of the specific situation it seems fairly clear to us that Valukas should not be performing the investigation on behalf of management.  The Board of Directors should have initiated the primary investigation using an independent outside firm - essentially what it has now done but which is limited to the narrow issue of information flow to the Board.  Having current management sponsor an investigation of itself using a firm with commercial ties to GM will not result in high confidence in its findings.

In a broader sense this situation models the contours of a wider problem associated with ensuring safety in complex organizational systems.  In the GM case the assurance of a completely objective and thorough investigation seems to come down to the personal integrity of Mr. Valukas.  While we have no reason to doubt his credentials or integrity, he is being placed in a situation where an aggressive investigation could have negative impacts on GM and its management - who are clients of Mr. Valukas’ law firm.  In addition this investigation will involve products liability issues which inevitably involve GM’s internal lawyers; in all probability Valukas’ firm has professional relationships with these lawyers making it a particularly sensitive situation.  It is certainly possible that Mr. Valukas will be immune to any implicit pressures due to these circumstances, but it is an approach that puts maximum reliance on the individual to do the “right” thing notwithstanding competing interests.  And in any event, the perception of an investigation of this type will always be subject to some question where conflicts are present.

We also see an interesting analogy to nuclear operations where the reliance on safety culture is in essence, reliance on personal integrity.  We are not implying there is anything wrong to expect and emphasize personal integrity, however all too often it becomes a panacea for countering significant costs or other impacts to operations and ensuring safety is accorded proper priority.  And if things go wrong, it is the norm that individuals are blamed and often, replaced.  In essence they failed the integrity test.  Why they failed, the elephant in the room, is hardly ever pursued.  Rarely if ever do corrective actions address minimizing or eliminating the influence of those conflicts, leaving the situation ripe for further failures.


*  J.S. Lublin and J. Bennett, “GM Directors Ask Why Cobalt Data Didn't Reach Them,” Wall Street Journal (May 14, 2014).

Wednesday, April 16, 2014

GM’s CEO Revealing Revelation

GM CEO Mary Barra
As most of our readers are aware General Motors has been much in the news of late regarding a safety issue associated with the ignition switches in the Chevy Cobalt.  At the beginning of April the new CEO of GM, Mary Barra, testified at Congressional hearings investigating the issue.  A principal focus of the hearings was the extent to which GM executives were aware of the ignition switch issues which were identified some ten years ago but did not result in recalls until this February.  Barra has initiated a comprehensive internal investigation of the issues to determine why it took so many years for a safety defect to be announced.

In a general sense this sounds all too familiar as the standard response to a significant safety issue.  Launch an independent investigation to gather the facts and figure out what happened, who knew what, who decided what and why.  The current estimate is that it will take almost two months for this process to be completed.  Also familiar is that accountability inevitably starts (and often ends) at the engineering and low level management levels.  To wit, GM has already announced that two engineers involved in the ignition switch issues have been suspended.

But somewhat buried in Barra’s Congressional testimony is an unusually revealing comment.  According to the Wall Street Journal, Barra said “senior executives in the past were intentionally not involved in details of recalls so as to not influence them.”*  Intentionally not involved in decisions regarding recalls - recalls which can involve safety defects and product liability issues and have significant public and financial liabilities.  Why would you not want the corporation's executives to be involved?  And if one is to believe the rest of Barra’s testimony, it appears executives were not even aware of these issues.

Well, what if executives were involved in these critical decisions - what influence could they have that GM would be afraid of?  Certainly if executive involvement would assure that technical assessments of potential safety defects were rigorous and conservative - that would not be undue influence.  So that leaves the other possibility - that involvement of executives could inhibit or constrain technical assessments from assuring an appropriate priority for safety.  This would be tantamount to the chilling effect popularized in the nuclear industry.  If management involvement creates an implicit pressure to minimize safety findings, there goes the safety conscious work environment and safety.


If keeping executives out of the decision process is believed to yield “better” decisions, it says some pretty bad things about either their competence or ethics.  Having executives involved should at least ensure that they are aware and knowledgeable of potential product safety issues and in a position to proactively assure that decisions and actions are appropriate.   What might be the most likely explanation is that executives don’t want the responsibility and accountability for these types of decisions.  They might prefer to remain protected at the safety policy level but leave the messy reality of comporting those dictates with real world business considerations to lower levels of the organization.  Inevitably accountability rolls downhill to somebody in the engineering or lower management ranks. 

One thing that is certain.  Whatever the substance and process of GM’s decision, it is not transparent, probably not well documented, and now requires a major forensic effort to reconstitute what happened and why.  This is not unusual and it is the standard course in other industries including nuclear generation.  Wouldn’t we be better off if decisions were routinely subject to the rigor of contemporaneous recording including how complex and uncertain safety issues are decided in the context of other business priorities, and by whom?



*  J.B. White and J. Bennett, "Some at GM Brass Told of Cobalt Woe," Wall Street Journal online (Apr. 11, 2014)

Thursday, March 13, 2014

Eliminate the Bad Before Attempting the Good

An article* in the McKinsey Quarterly suggests executives work at rooting out destructive behaviors before attempting to institute best practices.  The reason is simple: “research has found that negative interactions with bosses and coworkers [emphasis added] have five times more impact than positive ones.” (p. 81)  In other words, a relatively small amount of bad behavior can keep good behavior, i.e., improvements, from taking root.**  The authors describe methods for removing bad behavior and warning signs that such behavior exists.  This post focuses on their observations that might be useful for nuclear managers and their organizations.

Methods

Nip Bad Behavior in the Bud — Bosses and coworkers should establish zero tolerance for bad behavior but feedback or criticism should be delivered while treating the target employee with respect.  This is not about creating a climate of fear, it’s about seeing and responding to a “broken window” before others are also broken.  We spoke a bit about the broken window theory here.

Put Mundane Improvements Before Inspirational Ones/Seek Adequacy Before Excellence — Start off with one or more meaningful objectives that the organization can achieve in the short term without transforming itself.  Recognize and reward positive behavior, then build on successes to introduce new values and strategies.  Because people are more than twice as likely to complain about bad customer service as to mention good customer service, management intervention should initially aim at getting the service level high enough to staunch complaints, then work on delighting customers.

Use Well-Respected Staff to Squelch Bad Behavior — Identify the real (as opposed to nominal or official) group leaders and opinion shapers, teach them what bad looks like and recruit them to model good behavior.  Sounds like co-opting (a legitimate management tool) to me.

Warning Signs

Fear of Responsibility — This can be exhibited by employees doing nothing rather than doing the right thing, or their ubiquitous silence.  It is related to bystander behavior, which we posted on here.

Feelings of Injustice or Helplessness — Employees who believe they are getting a raw deal from their boss or employer may act out, in a bad way.  Employees who believe they cannot change anything may shirk responsibility.

Feelings of Anonymity — This basically means employees will do what they want because no one is watching.  This could lead to big problems in nuclear plants because they depend heavily on self-management and self-reporting of problems at all organizational levels.  Most of the time things work well but incidents, e.g., falsification of inspection reports or test results, do occur.

Our Perspective

The McKinsey Quarterly is a forum for McKinsey people and academics whose work has some practical application.  This article is not rocket science but sometimes a simple approach can help us appreciate basic lessons.  The key takeaway is that an overconfident new manager can sometimes reach too far, and end up accomplishing very little.  The thoughtful manager might spend some time figuring out what’s wrong (the “bad” behavior) and develop a strategy for eliminating it and not simply pave over it with a “get better” program that ignores underlying, systemic issues.  Better to hit a few singles and get the bad juju out of the locker room before swinging for the fences.


*  H. Rao and R.I. Sutton, “Bad to great: The path to scaling up excellence,” McKinsey Quarterly, no. 1 (Feb. 2014), pp. 81-91.  Retrieved Mar. 13, 2014.

**  Even Machiavelli recognized the disproportionate impact of negative interactions.  “For injuries should be done all together so that being less tasted they will give less offence.  Benefits should be granted little by little so that they may be better enjoyed.”  The Prince, ch. VIII.

Wednesday, February 12, 2014

Left Brain, Right Stuff: How Leaders Make Winning Decisions by Phil Rosenzweig

In this new book* Rosenzweig extends the work of Kahneman and other scholars to consider real-world decisions.  He examines how the content and context of such decisions is significantly different from controlled experiments in a decision lab.  Note that Rosenzweig’s advice is generally aimed at senior executives, who typically have greater latitude in making decisions and greater responsibility for achieving results than lower-level professionals, but all managers can benefit from his insights.  This review summarizes the book and explores its lessons for nuclear operations and safety culture. 

Real-World Decisions

Decision situations in the real world can be more “complex, consequential and laden with uncertainty” than those described in laboratory experiments. (p. 6)  A combination of rigorous analysis (left brain) and ambition (the right stuff—high confidence and a willingness to take significant risks) is necessary to achieve success. (pp. 16-18)  The executive needs to identify the important characteristics of the decision he is facing.  Specifically,

Can the outcome following the decision be influenced or controlled?

Some real-world decisions cannot be controlled, e.g., the price of Apple stock after you buy 100 shares.  In those situations the traditional advice to decision makers, viz., be rational, detached, analyze the evidence and watch out for biases, is appropriate. (p. 32)

But for many decisions, the executive (or his team) can influence outcomes through high (but not excessive) confidence, positive illusions, calculated risks and direct action.  The knowledgeable executive understands that individuals perceived as good executives exhibit a bias for action and “The essence of management is to exercise control and influence events.” (p. 39)  Therefore, “As a rule of thumb, it's better to err on the side of thinking we can get things done rather than assuming we cannot.  The upside is greater and the downside less.” (p. 43)

Think about your senior managers.  Do they under or over-estimate their ability to influence future performance through their decisions?

Is the performance based on the decision(s) absolute or relative?

Absolute performance is described using some system of measurement, e.g., how many free throws you make in ten attempts or your batting average over a season.  It is not related to what anyone else does. 

But in competition performance is relative to rivals.  Ten percent growth may not be sufficient if a rival grows fifty percent.**  In addition, payoffs for performance may be highly skewed: in the Olympics, there are three medals and the others get nothing; in many industries, the top two or three companies make money, the others struggle to survive; in the most extreme case, it's winner take all and the everyone else gets nothing.  It is essential to take risks to succeed in highly skewed competitive situations.

Absolute and relative performance may be connected.  In some cases, “a small improvement in absolute performance can make an outsize difference in relative performance, . . .” (p. 66)  For example, if a well-performing nuclear plant can pick up a couple percentage points of annual capacity factor (CF), it can make a visible move up the CF rankings thus securing bragging rights (and possibly bonuses) for its senior managers.

For a larger example, remember when the electricity markets deregulated and many utilities rushed to buy or build merchant plants?  Note how many have crawled back under the blanket of regulation where they only have to demonstrate prudence (a type of absolute performance) to collect their guaranteed returns, and not compete with other sellers.  In addition, there is very little skew in the regulated performance curve; even mediocre plants earn enough to carry on their business.  Lack of direct competition also encourages sharing information, e.g., operating experience in the nuclear industry.  If competition is intense, sharing information is irresponsible and possibly dangerous to one's competitive position. (p. 61)

Do your senior managers compare their performance to some absolute scale, to other members of your fleet (if you're in one), to similar plants, to all plants, or the company's management compensation plan?

Will the decision result in rapid feedback and be repeated or is it a one-off or will it take a long time to see results? 


Repetitive decisions, e.g., putting at golf, can benefit from deliberate practice, where performance feedback is used to adjust future decisions (action, feedback, adjustment, action).  This is related to the extensive training in the nuclear industry and the familiar do, check and adjust cycle ingrained in all nuclear workers.

However, most strategic decisions are unique or have consequences that will only manifest in the long-term.  In such cases, one has to make the most sound decision possible then take the best shot. 

Executives Make Decisions in a Social Setting

Senior managers depend on others to implement decisions and achieve results.  Leadership (exaggerated confidence, emphasizing certain data and beliefs over others, consistency, fairness and trust is indispensable to inspire subordinates and shape culture.  Quoting Jack Welch, “As a leader, your job is to steer and inspire.” (p. 146)  “Effective leadership . . . means being sincere to a higher purpose and may call for something less than complete transparency.” (p. 158)

How about your senior managers?  Do they tell the whole truth when they are trying to motivate the organization to achieve performance goals?  If not, how does that impact trust over the long term?  
    
The Role of Confidence and Overconfidence

There is a good discussion of the overuse of the term “overconfidence,” which has multiple meanings but whose meaning in a specific application is often undefined.  For example, overconfidence can refer to being too certain that our judgment is correct, believing we can perform better than warranted by the facts (absolute performance) or believing we can outperform others (relative performance). 

Rosenzweig conducted some internet research on overconfidence.  The most common use in the business press was to explain, after the fact, why something had gone wrong. (p. 85)  “When we charge people with overconfidence, we suggest that they contributed to their own demise.” (p. 87)  This sounds similar to the search for the “bad apple” after an incident occurs at a nuclear plant.

But confidence is required to achieve high performance.  “What's the best level of confidence?  An amount that inspires us to do our best, but not so much that we become complacent, or take success for granted, or otherwise neglect what it takes to achieve high performance.” (p. 95)

Other Useful Nuggets

There is a good extension of the discussion (introduced in Kahneman) of base rates and conditional probabilities including the full calculations from two of the conditional probability examples in Kahneman's Thinking, Fast and Slow (reviewed here).

The discussion on decision models notes that such models can be useful for overcoming common biases, analyzing large amounts of data and predicting elements of the future beyond our influence.  However, if we have direct influence, “Our task isn't to predict what will happen, but to make it happen.” (p. 189)

Other chapters cover decision making in a major corporate acquisition (focusing on bidding strategy) and in start-up businesses (focusing on a series of start-up decisions)

Our Perspective

Rosenzweig acknowledges that he is standing on the shoulders of Kahneman and others students of decision making.  But “An awareness of common errors and cognitive biases is only a start.” (p. 248)  The executive must consider the additional decision dimensions discussed above to properly frame his decision; in other words, he has to decide what he's deciding.

The direct applicability to nuclear safety culture may seem slight but we believe executives' values and beliefs, as expressed in the decisions they make over time, provide a powerful force on the shape and evolution of culture.  In other words, we choose to emphasize the transactional nature of leadership.  In contrast, Rosenzweig emphasizes its transformational nature: “At its core, however, leadership is not a series of discrete decisions, but calls for working through other people over long stretches of time.” (p. 164)  Effective leaders are good at both.

Of course, decision making and influence on culture is not the exclusive province of senior managers.  Think about your organization's middle managers—the department heads, program and project managers, and process owners.  How do they gauge their performance?  How open are they to new ideas and approaches?  How much confidence do they exhibit with respect to their own capabilities and the capabilities of those they influence? 

Bottom line, this is a useful book.  It's very readable, with many clear and engaging examples,  and has the scent of academic rigor and insight; I would not be surprised if it achieves commercial success.


*  P. Rosenzweig, Left Brain, Right Stuff: How Leaders Make Winning Decisions (New York: Public Affairs, 2014).

**  Referring to Lewis Carroll's Through the Looking Glass, this situation is sometimes called “Red Queen competition [which] means that a company can run faster but fall further behind at the same time.” (p. 57)

Monday, November 11, 2013

Engineering a Safer World: Systems Thinking Applied to Safety by Nancy Leveson

In this book* Leveson, an MIT professor, describes a comprehensive approach for designing and operating “safe” organizations based on systems theory.  The book presents the criticisms of traditional incident analysis methods, the principles of system dynamics, and essential safety-related organizational characteristics, including the role of culture, in one place; this review emphasizes those topics.  It should be noted the bulk of the book describes her accident causality model and how to apply it, including extensive case studies; this review does not fully address that material.

Part I
     
Part I sets the stage for a new safety paradigm.  Many contemporary socio-technical systems exhibit, among other characteristics, rapidly changing technology, increasing complexity and coupling, and pressures that put production ahead of safety. (pp. 3-6)   Traditional accident analysis techniques are no longer sufficient.  They too often focus on eliminating failures, esp. component failures or “human error,” instead of concentrating on eliminating hazards. (p. 10)  Some of Leveson's critique of traditional accident analysis echoes Dekker (esp. the shortcomings of Newtonian-Cartesian analysis, reviewed here).**   We devote space to Leveson's criticisms because she provides a legitimate perspective on techniques that comprise some of the nuclear industry's sacred cows.

Event-based models are simply inadequate.  There is subjectivity in selecting both the initiating event (the failure) and the causal chains backwards from it.  The root cause analysis often stops at the first root cause that is familiar, amenable to corrective action, difficult to get beyond (usually the human operator or other human role) or politically acceptable. (pp. 20-24)  Reason's Swiss cheese model is insufficient because of its assumption of direct, linear relationships between components. (pp. 17-19)  In addition, “event-based models are poor at representing systemic accident factors such as structural deficiencies in the organization, management decision making, and flaws in the safety culture of the company or industry.” (p. 28)

Probabilistic Risk Assessment (PRA) studies specified failure modes in ever greater detail but ignores systemic factors.  “Most accidents in well-designed systems involve two or more low-probability events occurring in the worst possible combination.  When people attempt to predict system risk, they explicitly or implicitly multiply events with low probability—assuming independence—and come out with impossibly small numbers, when, in fact, the events are dependent.  This dependence may be related to common systemic factors that do not appear in an event chain.  Machol calls this phenomenon the Titanic coincidence . . . The most dangerous result of using PRA arises from considering only immediate physical failures.” (pp. 34-35)  “. . . current [PRA] methods . . . are not appropriate for systems controlled by software and by humans making cognitively complex decisions, and there is no effective way to incorporate management or organizational factors, such as flaws in the safety culture, . . .” (p. 36) 

The search for operator error (a fall guy who takes the heat off of system designers and managers) and hindsight bias also contribute to the inadequacy of current accident analysis approaches. (p. 38)  In contrast to looking for an individual's “bad” decision, Leveson says “the study of decision making cannot be separated from a simultaneous study of the social context, the value system in which it takes place, and the dynamic work process it is intended to control.” (p. 46) 

Leveson says “Systems are not static. . . . they tend to involve a migration to a state of increasing risk over time.” (p. 51)  Causes include adaptation in response to pressures and the effects of multiple independent decisions. (p. 52)  This is reminiscent of  Hollnagel's warning that cost pressure will eventually push production to the edge of the safety boundary.

When accidents or incidents occur, Leveson proposes that analysis should search for reasons (the Whys) rather than blame (usually defined as Who) and be based on systems theory. (pp. 55-56)  In a systems view, safety is an emergent property, i.e., system safety performance cannot be predicted by analyzing system components. (p. 64)  Some of the goals for a better model include analysis that goes beyond component failures and human errors, is more scientific and less subjective, includes the possibility of system design errors and dysfunctional system interactions, addresses software, focuses on mechanisms and factors that shape human behavior, examines processes and allows for multiple viewpoints in the incident analysis. (pp. 58-60) 

Part II

Part II describes Leveson's proposed accident causality model based on systems theory: STAMP (Systems-Theoretic Accident Model and Processes).  For our purposes we don't need to spend much space on this material.  “The model includes software, organizations, management, human decision-making, and migration of systems over time to states of heightened risk.”***   It attempts to achieve the goals listed at the end of Part I.

STAMP treats safety in a system as a control problem, not a reliability one.  Specifically, the overarching goal “is to control the behavior of the system by enforcing the safety constraints in its design and operation.” (p. 76)  Controls may be physical or social, including cultural.  There is a good discussion of the hierarchy of control in a complex system and the impact of possible system dynamics, e.g., time lags, feedback loops and changes in control structures. (pp. 80-87)  “The process leading up to an accident is described in STAMP in terms of an adaptive feedback function that fails to maintain safety as system performance changes over time to meet a complex set of goals and values.” (p. 90)

Leveson describes problems that can arise from an inaccurate mental model of a system or an inaccurate model displayed by a system.  There is a lengthy, detailed case study that uses STAMP to analyze a tragic incident, in this case a friendly fire accident where a U.S. Army helicopter was shot down by an Air Force plane over Iraq in 1994.

Part III

Part III describes in detail how STAMP can be applied.  There are many useful observations (e.g., problems with mode confusion on pp. 289-94) and detailed examples throughout this section.  Chapter 11 on using a STAMP-based accident analysis illustrates the claimed advantages of  STAMP over traditional accident analysis techniques. 

We will focus on a chapter 13, “Managing Safety and the Safety Culture,” which covers the multiple dimensions of safety management, including safety culture.

Leveson's list of the components of effective safety management is mostly familiar: management commitment and leadership, safety policy, communication, strong safety culture, safety information system, continual learning, education and training. (p. 421)  Two new components need a bit of explanation, a safety control structure and controls on system migration toward higher risk.  The safety control structure assigns specific safety-related responsibilities to management, system designers and operators. (pp. 436-40)  One of the control structure's responsibilities is to identify “the potential reasons for and types of migration toward higher risk need to be identified and controls instituted to prevent it.” (pp. 425-26)  Such an approach should be based on the organization's comprehensive hazards analysis.****

The safety culture discussion is also familiar. (pp. 426-33)  Leveson refers to the Schein model, discusses management's responsibility for establishing the values to be used in decision making, the need for open, non-judgmental communications, the freedom to raise safety questions without fear of reprisal and widespread trust.  In such a culture, Leveson says an early warning system for migration toward states of high risk can be established.  A section on Just Culture is taken directly from Dekker's work.  The risk of complacency, caused by inaccurate risk perception after a long history of success, is highlighted.

Although these management and safety culture contents are generally familiar, what's new is relating them to systems concepts such as control loops and feedback and taking a systems view of the safety control system.

Our Perspective
 

Overall, we like this book.  It is Leveson's magnum opus, 500+ pages of theory, rationale, explanation, examples and infomercial.  The emphasis on the need for a systems perspective and a search for Why accidents/incidents occur (as opposed to What happened or Who is at fault) is consistent with what we've been saying on this blog.  The book explains and supports many of the beliefs we have been promoting on Safetymatters: the shortcomings of traditional (but commonly used) methods of incident investigation; the central role of decision making; and how management commitment, financial and non-financial rewards, and a strong safety culture contribute to system safety performance.
 

However, there are only a few direct references to nuclear.  The examples in the book are mostly from aerospace, aviation, maritime activities and the military.  Establishing a safety control structure is probably easier to accomplish in a new aerospace project than in an existing nuclear organization with a long history (aka memory),  shifting external pressures, and deliberate incremental changes to hardware, software, policies, procedures and programs.  Leveson does mention John Carroll's (her MIT colleague) work at Millstone. (p. 428)  She praises nuclear LER reporting as a mechanism for sharing and learning across the industry. (pp. 406-7)  In our view, LERs should be helpful but they are short on looking at why incidents occur, i.e., most LER analysis does not look at incidents from a systems perspective.  TMI is used to illustrate specific system design/operation problems.
 

We don't agree with the pot shots Leveson takes at High Reliability Organization (HRO) theorists.  First, she accuses HRO of confusing reliability with safety, in other words, an unsafe system can function very reliably. (pp. 7, 12)  But I'm not aware of any HRO work that has been done in an organization that is patently unsafe.  HRO asserts that reliability follows from practices that recognize and contain emerging problems.  She takes another swipe at HRO when she says HRO suggests that, during crises, decision making migrates to frontline workers.  Leveson's problem with that is “the assumption that frontline workers will have the necessary knowledge and judgment to make decisions is not necessarily true.” (p. 44)  Her position may be correct in some cases but as we saw in our review of CAISO, when the system was veering off into new territory, no one had the necessary knowledge and it was up to the operators to cope as best they could.  Finally, she criticizes HRO advice for operators to be on the lookout for “weak signals.”  In her view, “Telling managers and operators to be “mindful of weak signals” simply creates a pretext for blame after a loss event occurs.” (p. 410)  I don't think it's pretext but it is challenging to maintain mindfulness and sense faint signals.  Overall, this appears to be academic posturing and feather fluffing.
 

We offer no opinion on the efficacy of using Leveson's STAMP approach.  She is quick to point out a very real problem in getting organizations to use STAMP: its lack of focus on finding someone/something to blame means it does not help identify subjects for discipline, lawsuits or criminal charges. (p. 86)
 

In Leveson's words, “The book is written for the sophisticated practitioner . . .” (p. xviii)  You don't need to run out and buy this book unless you have a deep interest in accident/incident analysis and/or are willing to invest the time required to determine exactly how STAMP might be applied in your organization.


*  N.G. Leveson, Engineering a Safer World: Systems Thinking Applied to Safety (The MIT Press, Cambridge, MA: 2011)  The link goes to a page where a free pdf version of the book can be downloaded; the pdf cannot be copied or printed.  All quotes in this post were retyped from the original text.


**  We're not saying Dekker or Hollnagel developed their analytic viewpoints ahead of Leveson; we simply reviewed their work earlier.  These authors are all aware of others' publications and contributions.  Leveson includes Dekker in her Acknowledgments and draws from Just Culture: Balancing Safety and Accountability in her text. 

***  Nancy Leveson informal bio page.


****  “A hazard is a system state or set of conditions that, together with a particular set of worst-case environmental conditions, will lead to an accident.” (p. 157)  The hazards analysis identifies all major hazards the system may confront.  Baseline safety requirements follow from the hazards analysis.  Responsibilities are assigned to the safety control structure for ensuring baseline requirements are not violated while allowing changes that do not raise risk.  The identification of system safety constraints allows the possibility of identifying leading indicators for a specific system. (pp. 337-38)

Monday, October 14, 2013

High Reliability Management by Roe and Schulman

This book* presents a multi-year case study of the California Independent System Operator (CAISO), the government entity created to operate California's electricity grid when the state deregulated its electricity market.  CAISO's travails read like The Perils of Pauline but our primary interest lies in the authors' observations of the different grid management strategies CAISO used under various operating conditions; it is a comprehensive description of contingency management in the real world.  In this post we summarize the authors' management model, discuss the application to nuclear management and opine on the implications for nuclear safety culture.

The High Reliability Management (HRM) Model

The authors call the model they developed High Reliability Management and present it in a 2x2 matrix where the axes are System Volatility and Network Options Variety. (Ch. 3)  System Volatility refers to the magnitude and rate of change of  CAISO's environmental variables including generator and transmission availability, reserves, electricity prices, contracts, the extent to which providers are playing fair or gaming the system, weather, temperature and electricity demand (regional and overall).  Network Options Variety refers to the range of resources and strategies available for meeting demand (basically in real time) given the current inputs. 

System Volatility and Network Options Variety can each be High or Low so there are four possible modes and a distinctive operating management approach for each.  All modes must address CAISO's two missions of matching electricity supply and demand, and protecting the grid.  Operators must manage the system inside an acceptable or tolerable performance bandwidth (invariant output performance is a practical impossibility) in all modes.  Operating conditions are challenging: supply and demand are inherently unstable (p. 34), inadequate supply means some load cannot be served and too much generation can damage the grid. (pp. 27, 142)

High Volatility and High Options mean both generation (supply) and demand are changing quickly and the operators have multiple strategies available for maintaining balance.  Some strategies can be substituted for others.  It is a dynamic but manageable environment.

High Volatility and Low Options mean both generation and demand are changing quickly but the operators have few strategies available for maintaining balance.  They run from pillar to post; it is highly stressful.  Sometimes they have to create ad hoc (undocumented and perhaps untried) approaches using trail and error.  Demand can be satisfied but regulatory limits may be exceeded and the system is running closer to the edge of technical capabilities and operator skills.  It is the most unstable performance mode and untenable because the operators are losing control and one perturbation can amplify into another. (p. 37)

Low Volatility and Low Options mean generation and demand are not changing quickly.  The critical feature here is demand has been reduced by load shedding.  The operators have exhausted all other strategies for maintaining balance.  It is a command-and-control approach, effected by declaring a  Stage 3 grid situation and run using formal rules and procedures.  It is the least desirable domain because one primary mission, to meet all demand, is not being accomplished. 

Low Volatility and High Options is an HRM's preferred mode.  Actual demand follows the forecast, generators are producing as expected, reserves are on hand, and there is no congestion on transmission lines or backup routes are available.  Procedures based on analyzed conditions exist and are used.  There are few, if any, surprises.  Learning can occur but it is incremental, the result of new methods or analysis.  Performance is important and system behavior operates within a narrow bandwidth.  Loss of attention (complacency) is a risk.  Is this starting to sound familiar?  This is the domain of High Reliability Organization (HRO) theory and practice.  Nuclear power operations is an example of an HRO. (pp. 60-62)          

Lessons for Nuclear Operations 


Nuclear plants work hard to stay in the Low Volatility/High Options mode.  If they stray into the Low Options column, they run the risks of facing unanalyzed situations and regulatory non-compliance. (p. 62)  In their effort to optimize performance in the desired mode, plants examine their performance risks to ever finer granularity through new methods and analyses.  Because of the organizations' narrow focus, few resources are directed at identifying, contemplating and planning for very low probability events (the tails of distributions) that might force a plant into a different mode or have enormous potential negative consequences.**  Design changes (especially new technologies) that increase output or efficiency may mask subtle warning signs of problems; organizations must be mindful to performance drift and nascent problems.   

In an HRO, trial and error is not an acceptable method for trying out new options.  No one wants cowboy operators in the control room.  But examining new options using off-line methods, in particular simulation, is highly desirable. (pp. 111, 233)  In addition, building reactive capacity in the organization can be a substitute for foresight to accommodate the unexpected and unanalyzed. (pp. 116-17)  

The focus on the external changes that buffeted CAISO leads to a shortcoming when looking for lessons for nuclear.  The book emphasizes CAISO's adaptability to new environmental demands, requirements and constraints but does not adequately recognize the natural evolution of the system.  In nuclear, it's natural evolution that may quietly lead to performance drift and normalization of deviance.  In a similar vein, CAISO has to worry about complacency in just one mode, for nuclear it's effectively the only mode and complacency is an omnipresent threat. (p. 126) 

The risk of cognitive overload occurs more often for CAISO operators but it has visible precursors; for nuclear operators the risk is overload might occur suddenly and with little or no warning.*** Anticipation and resilience are more obvious needs at CAISO but also necessary in nuclear operations. (pp. 5, 124)

Implications for Safety Culture

Both HRMs and HROs need cultures that value continuous training, open communications, team players able to adjust authority relationships when facing emergent issues, personal responsibility for safety (i.e., safety does not inhere in technology), ongoing learning to do things better and reduce inherent hazards, rewards for achieving safety and penalties for compromising it, and an overall discipline dedicated to failure-free performance. (pp. 198, App. 2)  Both organizational types need a focus on operations as the central activity.  Nuclear is good at this, certainly better than CAISO where entities outside of operations promulgated system changes and the operators were stuck with making them work.

The willingness to report errors should be encouraged but we have seen that is a thin spot in the SC at some plants.  Errors can be a gateway into learning how to create more reliable performance and error tolerance vs. intolerance is a critical cultural issue. (pp. 111-12, 220) 

The simultaneous needs to operate within a prescribed envelope while considering how the envelope might be breached has implications for SC.  We have argued before that a nuclear organization is well-served by having a diversity of opinions and some people who don't subscribe to group think and instead keep asking “What's the worst case scenario and how would we manage it to an acceptable conclusion?” 

Conclusion

This review gives short shrift to the authors' broad and deep description and analysis of CAISO.****  The reason is that the major takeaway for CAISO, viz., the need to recognize mode shifts and switch management strategies accordingly as the manifestation of “normal” operations, is not really applicable to day-to-day nuclear operations.

The book describes a rare breed, the socio-technical-political start-up, and has too much scope for the average nuclear practitioner to plow through searching for newfound nuggets that can be applied to nuclear management.  But it's a good read and full of insightful observations, e.g., the description of  CAISO's early days (ca. 2001-2004) when system changes driven by engineers, politicians and regulators, coupled with changing challenges from market participants, prevented the organization from settling in and effectively created a negative learning curve with operators reporting less confidence in their ability to manage the grid and accomplish the mission in 2004 vs. 2001. (Ch. 5)

(High Reliability Management was recommended by a Safetymatters reader.  If you have a suggestion for material you would like to see promoted and reviewed, please contact us.)

*  E. Roe and P. Schulman, High Reliability Management (Stanford Univ. Press, Stanford, CA: 2008)  This book reports the authors' study of CAISO from 2001 through 2006. 

**  By their nature as baseload generating units, usually with long-term sales contracts, nuclear plants are unlikely to face a highly volatile business environment.  Their political and social environment is similar: The NRC buffers them from direct interference by politicians although activists prodding state and regional authorities, e.g., water quality boards, can cause distractions and disruptions.

The importance of considering low-probability, major consequence events is argued by Taleb (see here) and Dédale (see here).

***  Over the course of the authors' investigation, technical and management changes at CAISO intended to make operations more reliable often had the unintended effect of moving the edge of the prescribed performance envelope closer to the operators' cognitive and skill capacity limits. 

The Cynefin model describes how organizational decision making can suddenly slip from the Simple domain to the Chaotic domain via the Complacent zone.  For more on Cynefin, see here and here.

****  For instance, ch. 4 presents a good discussion of the inadequate or incomplete applicability of Normal Accident Theory (Perrow, see here) or High Reliability Organization theory (Weick, see here) to the behavior the authors observed at CAISO.  As an example, tight coupling (a threat according to NAT) can be used as a strength when operators need to stitch together an ad hoc solution to meet demand. (p. 135)

Ch. 11 presents a detailed regression analysis linking volatility in selected inputs to volatility in output, measured by the periods when electricity made available (compared to demand) fell outside regulatory limits.  This analysis illustrated how well CAISO's operators were able to manage in different modes and how close they were coming to the edge of their ability to control the system, in other words, performance as precursor to the need to go to Stage 3 command-and-control load shedding.

Tuesday, September 17, 2013

Even Macy’s Does It

We have long been proponents of looking for innovative ways to improve safety management training for nuclear professionals.  We’ve taken the burden to develop a prototype management simulator, NuclearSafetySim, and made it available to our readers to experience for themselves (see our July 30, 2013 post).  In the past we have also noted other industries and organizations that have embraced simulation as an effective management training tool.

An August article in the Wall Street Journal* cites several examples of new approaches to manager training.  Most notable in our view is Macy’s use of simulations to have managers gain decision making experience.  As the article states:

“The simulation programs aim to teach managers how their daily decisions can affect the business as a whole.”

We won’t revisit all the arguments that we’ve made for taking a systems view of safety management, focusing on decisions as the essence of safety culture and using simulation to allow personnel to actualize safety values and priorities.  All of these could only enrich, challenge and stimulate training activities. 

A Clockwork Magenta

 
On the other hand what is the value of training approaches that reiterate INPO slide shows, regulatory policy statements and good practices in seemingly endless iterations?  Brings to mind the character Alex, the incorrigible sociopath in A Clockwork Orange with an unusual passion for classical music.**  He is the subject of “reclamation treatment”, head clamped in a brace and eyes pinned wide open, forced to watch repetitive screenings of anti-social behavior to the music of Beethoven’s Fifth.  We are led to believe this results in a “cure” but does it and at what cost?

Nuclear managers may not be treated exactly like Alex but there are some similarities.  After plant problems occur and are diagnosed, managers are also declared “cured” after each forced feeding of traits, values, and the need for increased procedure adherence and oversight.  Results still not satisfactory?  Repeat.



*  R. Feintzeig, "Building Middle-Manager Morale," Wall Street Journal (Aug. 7, 2013).  Retrieved Sept. 24, 2013.

**  M. Amis, "The Shock of the New:‘A Clockwork Orange’ at 50,"  New York Times Sunday Book Review (Aug. 31, 2013).  Retrieved Sept. 24, 2013.

Thursday, August 15, 2013

No Innocent Bystanders

The stake that sticks up gets hammered down.
We recently saw an article* about organizational bystander behavior.  Organizational bystanders are people who sense or believe that something is wrong—a risk is increasing or a hazard is becoming manifest—but they don't force their organization to confront the issue or they only halfheartedly pursue it.**  This is a significant problem in high-hazard activities; it seems that after a serious incident occurs, there is always someone, or even several someones, who knew the incident's causes existed but didn't say anything.  Why don't these people speak up?

The authors describe psychological and organizational factors that encourage bystander behavior.  Psychological factors are rooted in uncertainty, observing the failure of others to act and the expectation that expert or formal authorities will address the problem.  Fear is a big factor: fear of being wrong, fear of being chastised for thinking above one's position or outside one's field of authority, fear of being rejected by the work group even if one's concerns are ultimately shown to be correct or fear of being considered disloyal; in brief, fear of the dominant culture. 

Organizational factors include the processes and constraints the organization uses to filter information and make decisions.  Such factors include limiting acceptable information to that which comports with the organization's basic assumptions, and rigid hierarchical and role structures—all components of the organization's culture.  Other organizational factors, e.g., resource constraints and external forces, apply pressure on the culture.  In one type of worst case, “imposing nonnegotiable performance objectives combined with severe sanctions for failure encourages the violation of rules, reporting distortions, and dangerous, sometimes illegal short-cuts.” (p. 52)  Remember Massey Energy and the Upper Big Branch mine disaster?

The authors provide a list of possible actions to mitigate the likelihood of bystander behavior.  Below we recast some of these actions as desirable organizational (or cultural) attributes.

  • Mechanisms exist for encouraging and expressing dissenting points of view;
  • Management systems balance the need for short-term performance with the need for productive inquiry into potential threats;
  • Approaches exist to follow-up on near-misses and other “weak signals” [an important attribute of high reliability organizations]:
  • Disastrous but low probability events are identified and contingency plans prepared;
  • Performance reviews, self-criticism, and a focus on learning at all levels are required.
Even in such a better world, “bystander behavior is not something that can be 'fixed' once and for all, as it is a natural outgrowth of the interplay of human psychology and organizational forces. The best we can hope for is to manage it well, and, by so doing, help to prevent catastrophic outcomes.” (p.53) 

Our Perspective

This paper presents a useful discussion of the interface between the individual and the organization under problematic conditions, viz., when the individual sees something that may be at odds with the prevailing world view.  It's important to realize that even if the organizational factors are under control, many people will still be reluctant to rock the boat, lo the risk they see is to the boat itself.   

The authors correctly emphasize the important role of leadership in developing the desirable organizational attributes, however, as we have argued elsewhere, leadership can influence, but not unilaterally specify, organizational culture. 

We would like to see more discussion of systemic processes.  For example, the impact of possible negative feedback on the individual is described but positive feedback, such as through the compensation, recognition and reward systems, is not discussed.  Organizational learning (adaptation) is mentioned but not well developed.

The article mentions the importance of independent watchdogs.  We note that in the nuclear industry, the regulator plays an important role in encouraging bystanders to get involved and protecting them if they do.

The article concludes with a section on the desirable contributions of the human resources (HR) department.  It is, quite frankly, unrealistic (it overstates the role and authority of HR in nuclear organizations I have seen) but was probably necessary to get the article published in an HR journal. 


*  M.S. Gerstein and R.B. Shaw, “Organizational Bystanders,” People and Strategy 31, no. 1 (2008), pp. 47-54.  Thanks to Madalina Tronea for publicizing this article on the LinkedIn Nuclear Safety group.  Dr. Tronea is the group's founder/manager.

**  This is a bit different from the classic bystander effect which refers to a situation where the more people present when help is needed, the less likely any one of them is to provide the help, each one expecting others to provide assistance. 

Wednesday, August 7, 2013

Nuclear Industry Scandal in South Korea

As you know, over the past year trouble has been brewing in the South Korean nuclear industry.  A recent New York Times article* provides a good current status report.  The most visible problem is the falsification of test documents for nuclear plant parts.  Executives have been fired, employees of both a testing company and the state-owned entity that inspects parts and validates their safety certificates have been indicted.

It should be no surprise that the underlying causes are rooted in the industry structure and culture.  South Korea has only one nuclear utility, state-owned Korea Electric Power Corporation (Kepco).  Kepco retirees go to work for parts suppliers or invest in them.  Cultural attributes include valuing personal ties over regulations, and school and hometown connections.  Bribery is used as a lubricating agent.

As a consequence,  “In the past 30 years, our nuclear energy industry has become an increasingly closed community that emphasized its specialty in dealing with nuclear materials and yet allowed little oversight and intervention,” the government’s Ministry of Trade, Industry and Energy said in a recent report to lawmakers. “It spawned a litany of corruption, an opaque system and a business practice replete with complacency.”

Couldn't happen here, right?  I hope not, but the U.S. nuclear industry, while not as closed a system as its Korean counterpart, is hardly an open community.  The “unique and special” mantra promotes insular thinking and encourages insiders to view outsiders with suspicion.  The secret practices of the industry's self-regulator do not inspire public confidence.  A familiar cast of NEI/INPO participants at NRC stakeholder meetings fuels concern over the degree to which the NRC has been captured by industry.  Utility business decisions that ultimately killed plants (CR3, Kewaunee, San Onofre) appear to have been made in conference rooms isolated from any informed awareness of worst-case technical/commercial consequences.  Our industry has many positive attributes but some others ask us to stop and reflect.  

*  C. Sang-Hun, “Scandal in South Korea Over Nuclear Revelations,” New York Times (Aug. 3, 2013).  Retrieved Aug. 6, 2013.