Tuesday, April 2, 2024

Systems Engineering’s Role in Addressing Society’s Problems

Guru Madhavan, a National Academy of Engineering senior scholar, has a new book about how engineering can contribute to solving society’s most complex and intractable problems.  He published a related article* on the National Academies website.  The author describes four different types of problems, i.e., decision situations.  Importantly, he advocates a systems engineering** perspective for addressing each type.  We will summarize his approach and provide our perspective on it.

He begins with a metaphor of clocks and clouds.  Clocks operate on logical principles and underlie much of our physical world.  Clouds form and reform, no two are alike, they defy logic, only the instant appearance is real – a metaphor for many of our complex social problems.
 
Hard problems

Hard problems can be essentially bounded.  The systems engineer can identify components, interrelationships, processes, desired outcomes, and measures of performance.  The system can be optimized by applying mathematics, scientific knowledge, and experience.  The system designers’ underlying belief is that a best outcome exists and is achievable.  In our view, this is a world of clocks.

Soft problems

Soft problems arise in the field of human behavior, which is complicated by political and psychological factors.  Because goals may be unclear, and constraints complicate system design, soft problems cannot be solved like hard problems.

Soft problems involve technology, psychology, and sociology and resolving them may yield an outcome that’s not the best (optimal) but good enough.  Results are based on satisficing, an approach that satisfies and suffices.  We’d say clouds are forming overhead.
 
Messy problems

Messy problems emerge from divisions created by people’s differing value sets, belief systems, ideologies, and convictions.  An example would be trying to stop the spread of a pathogen while respecting a culture’s traditional burial practices.  In these situations, the system designer must try to transform the nature of the entity and/or its environment by dissolving the problem into manageable elements and moving them toward a desired state in which the problem no longer arises.  In the example above, this might mean creating dignified burial rituals and promoting safe public health practices.

Wicked problems

The cloudiest problems are the “wicked” ones.  A wicked problem emerges when hard, soft, and messy problems simultaneously exist together.  This means optimal solutions, satisficing resolutions, and dissolution may also co-exist.  A comprehensive model of a wicked problem might show solution(s) within a resolution, and a dissolution might contain resolutions and solutions.  As a consequence, engineers need to possess “competency—and consciousness— . . . to develop a balanced blend of hard solutions, soft resolutions, and messy dissolutions to wicked problems.”

Our perspective

People form their mental models of the world based on their education, training, and lived experiences.  These mental models are representations of how the world works.  They are usually less than totally accurate because of people’s cognitive limitations and built-in biases.

We have long argued that technocrats who traditionally manage and operate complicated industrial facilities, e.g., nuclear power plants, have inadequate mental models, i.e., they are clock people.  Their models are limited to cause-effect thinking; their focus is on fixing the obvious hard problems in front of them.  As a result, their fixes are limited: change a procedure or component design, train harder, supervise more closely, and apply discipline, including getting rid of the bad apples, as necessary.  Rinse and repeat.

In contrast, we assert that problem solving must recognize the existence of complex socio-technical systems.  Fixes need to address both physical issues and psychological and social concerns.  Analysts must consider relationships between hard and soft system components.  Problem solvers need to be cloud people.  

Proper systems thinking understands that problems seldom exist in isolation.  They are surrounded by a task environment that may contain conflicting goals (e.g., production vs. safety) and a solution space limited by company policies, resource limitations, and organizational politics.  The external legal-political environment can also influence goals and further constrain the solution space.

Madhavan has provided some good illustrations of mental models for problem solving, starting with the (relatively) easiest “hard” physical problems and moving through more complicated models to the realm of wicked problems that may, in some cases, be effectively unsolvable.

Bottom line: this is a good refresher for people who are already systems thinkers and a good introduction for people who aren’t.


*  G. Madhavan, “Engineering Our Wicked Problems,” National Academy of Engineering Perspectives (March 6, 2024).  Online only.

**  In Madhavan’s view, systems engineering considers all facets of a problem, recognizes sensitivities, shapes synergies, and accounts for side effects.

Saturday, March 2, 2024

Boeing’s Safety Culture Under the FAA’s Microscope

The Federal Aviation Administration (FAA) recently released its report* on the safety culture (SC) at Boeing.  The FAA Expert Panel was tasked with reviewing SC after two crashes involving the latest models of Boeing’s 737 MAX airplanes.  The January 2024 door plug blowout happened as the report was nearing completion and reinforces the report’s findings.

737 MAXX door plug

The report has been summarized and widely reported in mainstream media and we will not review all its findings and recommendations here.  We want to focus on two parts of the report that address topics we have long promoted as being keys to understanding how strong (or weak) an organization’s SC is, viz., an organization’s decision-making processes and executive compensation.  In addition, we will discuss a topic that’s new to us, how to ensure the independence of employees whose work includes assessing company work products from the regulator’s perspective.

Decision-making

An organization’s decision-making processes create some of the most visible artifacts of the organization’s culture: a string of decisions (guided by policies, procedures, and priorities) and their consequences.

The report begins with a clear FAA description of decision-making’s important role in a Safety Management System (SMS) and an organization’s overall management.  In part, an “SMS is all about decision-making. Thus it has to be a decision-maker's tool, not a traditional safety program separate and distinct from business and operational decision making.” (p. 10)

However, the panel’s finding on Boeing’s SMS is a mixed bag.  “Boeing provided evidence that it is using its SMS to evaluate product safety decisions and some business decisions. The Expert Panel’s review of Boeing’s SMS documentation revealed detailed procedures on how to use SMS to evaluate product safety decisions, but there are no detailed procedures on how to determine which business decisions affect safety or how they should be evaluated under SMS.” (emphasis added) (p. 35)

The associated recommendation is “Develop detailed procedures to determine which business activities should be evaluated under SMS and how to evaluate those decisions.” (ibid.)  We think the recommendation addresses the specific problem identified in the finding.

One of the major inputs to a decision-making system is an organization’s priorities.  The FAA says safety should always be the top priority but Boeing’s commitment to safety has arguably weakened over time.

“Boeing provided the Expert Panel with a copy of the Boeing Safety Management System Policy, dated April 2022, which states, in part, “… we make safety our top priority.” Boeing revised this policy in August 2023 with . . .  a change to the message “we make safety our top priority” to “safety is our foundation.”” (p. 29)

Lowering the bar did not help.  “The [Expert] panel observed documentation, survey responses, and employee interviews that did not provide objective evidence of a foundational commitment to safety that matched Boeing’s descriptions of that objective.” (p. 22)

Boeing also created seeds of confusion for its safety decision makers.  Boeing implemented its SMS to operate alongside (and not replace or integrate with) its existing safety program.

“During interviews, Boeing employees highlighted that SMS implementation was not to disrupt existing safety program or systems.  SMS operating procedure documents spoke of SMS as the overarching safety program but then also provided segregation of SMS-focused activities from legacy safety activities . . .” (p. 24)

Executive compensation

We have long said that if safety performance is important to an organization then their senior managers’ compensation should have a safety performance-related component. 

Boeing has included safety in its executive financial incentive program.  Safety is one of five factors comprising operational performance which, in turn, is combined with financial performance to determine company-level performance.  Because of the weights used in the incentive model, “The Product Safety measure comprised approximately 4% of the overall 2022 Annual Incentive Award.” (p. 28)

Is 4% enough to influence executive behavior?  You be the judge.

Employee independence from undue management influence   

Boeing’s relationship with the FAA has an aspect that we don’t see in other industries. 

Boeing holds an Organization Designation Authorization (ODA) from the FAA. This allows Boeing to “make findings and issue certificates, i.e., perform discretionary functions in engineering, manufacturing, operations, airworthiness, or maintenance on behalf of the [FAA] Administrator.” (p. 12)

Basically, the FAA delegates some of its authority to Boeing employees, the ODA Unit Members (UMs), who then perform certain assessment and certification tasks.  “When acting as a representative of the Administrator, an individual is required to perform in a manner consistent with the policies, guidelines, and directives of the FAA. When performing a delegated function, an individual is legally distinct from, and must act independent of, the ODA holder.” (ibid.)  These employees are supposed to take the FAA’s view of situations and apply the FAA’s rules even if the FAA’s interests are in conflict with Boeing’s business interests. 

This might work in a perfect world but in Boeing’s world, it’s had and has problems, primarily “Boeing’s restructuring of the management of the ODA unit decreased opportunities for interference and retaliation against UMs, and provides effective organizational messaging regarding independence of UMs. However, the restructuring, while better, still allows opportunities for retaliation to occur, particularly with regards to salary and furlough ranking.” (emphasis added) (p. 5)  In addition, “The ability to comply with the ODA’s approved procedures is present; however, the integration of the SMS processes, procedures, and data collection requirements has not been accomplished.” (p. 26)

To an outsider, this looks like bad organizational design and practices. 

The U.S. commercial nuclear industry offers a useful contrast.  The regulator (Nuclear Regulatory Commission) expects its licensees to follow established procedures, perform required tests and inspections, and report any problems to the NRC.  Self-reporting is key to an effective relationship built on a base of trust.  However, it’s “trust but verify.”  The NRC has their own full-time employees in all the power plants, performing inspections, monitoring licensee operations, and interacting with licensee personnel.  The inspectors’ findings can lead, and have led, to increased oversight of licensee activities by the NRC.

Our perspective

It’s obvious that Boeing has emphasized production over safety.  The problems described above are evidence of broad systemic issues which are not amenable to quick fixes.  Integrating SC into everyday decision-making is hard work of the “continuous improvement” variety; it will not happen by management fiat.  Adjusting the compensation plan will require the Board to take safety more seriously.  Reworking the ODA program to eliminate all pressures and goal conflicts may not be possible; this is a big problem because the FAA has effectively deputized 1,000 people to perform FAA functions at Boeing. (p. 25)

The report only covers the most visible SC issues.  Complacency, normalization of deviation, the multitude of biases that can affect decision-making, and other corrosive factors are perennial threats to a strong SC and can affect “the natural drift in organizations.” (p. 40)  Such drift may lead to everything from process inefficiencies to tragic safety failures.

Boeing has taken one step: they fired the head of the 737 MAX program.**  Organizations often toss a high-level executive into a volcano to appease the regulatory gods and buy some time.  Boeing’s next challenge is that the FAA has given Boeing 90 days to fix its quality problems highlighted by the door plug blowout.***

Bottom line: Grab your popcorn, the show is just starting.  Boeing is probably too big to fail but it is definitely going to be pulled through the wringer. 


*  Section 103Organization Designation Authorizations (ODA) for Transport Airplanes Expert Panel Review Report,” Federal Aviation Administration (Feb. 26, 2024). 

**  N. Robertson, “Boeing fires head of 737 Max program,” The Hill (Feb. 21, 2024).

***  D. Shepardson and V. Insinna, “FAA gives Boeing 90 days to develop plan to address quality issues,” Reuters (Feb. 28, 2024).