The Space Shuttle
Columbia Disaster: Clarity of Purpose and its Impact on Risk
by Frank Winters, first
published on the Gantthead website
Introduction
Space
Shuttle Mission STS-107 ended in disaster on February 1, 2003, cutting
short the lives of the seven remarkable crew members. The United States
and most of the world is in mourning over this tragic event. Once again,
the United States space program is being subjected to criticism, costly
delays and close scrutiny.
One
of the factors contributing to controversy over the Space Shuttle
Program has been a perceived lack of clarity of purpose. This lack of
clarity has made it difficult to determine an acceptable level of risk.
We have heard many people speak out in favor of continuing the space
program. Retired astronauts, politicians, NASA administrators and
journalists have said we must continue space exploration because mankind
will always seek to understand the unknown, will continue to push the
envelope and will continue to reap many benefits from this important
work. Most advocates say we know there are risks but we are willing to
accept risk given the importance of the mission. But what is the risk,
what is acceptable risk and what exactly is the mission?
Where
does the execution of the Space Shuttle Program stand? To quote NASA’s
website, “STS stands for ‘Space Transportation System’, a.k.a. the
Space Shuttle, and ‘107’ is the ‘flight tail number’ or the
107th flight of the Space Shuttle, although the order of missions may
have changed after assignment of the flight number.” So by
nomenclature design, STS-107 should have been the 107th flight, but in
practice that’s not exactly correct. According to an article in
Sunday’s online addition of the New York Times, there have
actually been a total of 113 flights of the Space Shuttle. (Other
articles say there have been 111 flights.)
Therefore,
the tragic ending to STS-107 is the second catastrophic result out of
about 113 flights.
Many
questions arise: Was this latest disaster an unexpected accident, a
function of the inherent risk of space travel, a result of human failure
or an incident in the development of an experimental mode of
transportation? Should we be shocked and surprised when a shuttle
mission ends in disaster? Did the astronauts realize that the
probability of catastrophic failure was so high? Is a 2-in-113
probability of loss of crew and vehicle an acceptable risk level, given
the nature of the mission? Is this probability about what NASA expected?
NASA’s
Mission Unclear
Acceptable
risk level is a function, in part, of the purpose and importance of a
program. There is reason to believe that the purpose and mission of the
shuttle and space program itself is unclear. Sean O’Keefe is the
administrator of NASA, having been appointed in December of 2001. The
NASA website quotes O’Keefe: “In broad terms, our mandate is to
pioneer the future...to push the envelope...to do what has never been
done before.”
The
National Aeronautics and Space Act, passed by Congress in 1958, states:
“The Congress declares that the general welfare of the United
States requires that NASA (as
established by Title 5 of this act) seek and encourage to the maximum
extent possible, the fullest commercial use of space.”
NASA’s
earlier statement of purpose is quite clear. Administrator O’Keefe’s
statement, while visionary, is much less clear. The shuttle program is
an important part of the space program. Is the purpose and role of the
shuttle program clear?
The
Changing Purpose of the Shuttle
The
shuttle was designed to provide a reusable means of transportation to
get people into orbit. One stated goal of the shuttle program as
presented to President Nixon in the early 1970s was to make space travel
routine, with weekly trips and rapid turnaround. Prior to the Columbia
disaster, some people in the program were saying that this goal had been
met. Seventeen years later, with the launch of Columbia, it is
apparent that the shuttle has not nearly achieved the reliability
required of a routine mode of transportation.
Initially,
the shuttle was to travel to a space station. However, the station was
deleted from the original plans due to cost. When the space station was
deleted, NASA sold Congress the idea that the shuttle could become a
commercial enterprise, hoping to win the funding required to continue
the shuttle program. It would be a primary means of doing work by
contract such as launching satellites. It would in effect pay for
itself. This never came close to becoming a reality.
NASA
told Congress that the shuttle would fly weekly, at a cost of $5 to $10
million per flight. Today, the cost--as reported in this week’s Economist
magazine--is $500 million per flight. Of course, the shuttle makes only five
or six flights per year, and fees for commercial use of the shuttle
never came close to the per-flight cost.
A
Little History
The
U.S. space program, along with those of other nations, has its origin in
the early 1950s. In 1952, the International Council of Scientific Unions
decided to establish July 1, 1957, through December 31, 1958, as
International Geophysical Year (IGY) because cycles of solar activity
would be at a high point during that time, making it an excellent
opportunity for exploration and experimentation. As we know, Russia was
the first country to successfully launch a satellite, on October 4,
1957, and they repeated the feat with Sputnik II on November 3
of the same year. The U.S. Congress passed the Space Act the following
year, creating NASA and the modern U.S. space program.
Kennedy’s
Mission Statement
Our
space program and NASA got an enormous shot in the arm when President
Kennedy set a goal to send people safely to the moon and back within the
decade of the 1960s. This was accomplished more than once, with one
disaster (the Apollo mission disaster that caused the deaths of
three astronauts--Grissom, White and Chaffe--while still on the ground
in 1967) and one near disaster (Apollo 13).
During
this time and up until 1972, when the last moon shot took place, the
U.S. space program had a clear mission and purpose. A book about the
Mercury and Apollo missions is entitled Failure Is Not an Option.
This title sums up the focus and attitude toward risk mitigation that
results from a clear mission statement and sense of purpose.
Moon
Shots End, Space Stations and Shuttle Gain Center Stage
The
successful moon shots of the Apollo program ended with Apollo 17
in 1972. Since that time the direction and purpose of the space program
generally, and the space shuttle program specifically, have been less
clear.
In
1972 the funding for the space shuttle was approved--but not for the
space station it was designed to serve. The shuttle was born with an
unclear mission. In 1981, the first shuttle was launched. While it was
conceived as a shuttle, it is very often described as an orbiter, partly
because until the late 1990s it had no destination other than Earth's
orbit.
In
1984, the concept of the International Space Station was envisaged, and
in 1995 Russia, 11 European states, Japan, Canada and the United
States endorsed a plan to build the station. In 1998, 15 nations agreed
to work together to build the station, with construction starting that
year.
The
Shuttle and International Space Station (ISS)
The
shuttle and International Space Station are now components of an
approach to space exploration that has been discussed, designed and
redesigned for many years. Today, however, the mission of the ISS itself
is also more vague than it once was. Originally thought of as a way
station for launches into deeper space (including Mars exploration), it
is now the site of various outer space experiments.
In
his Time magazine Viewpoint article entitled “The
Space Shuttle Must be Stopped," Gregg Easterbrook says that the
space station was created to provide a destination for the shuttle and
the shuttle was created to provide transportation to the station.
What
is the Mission of the Shuttle Today?
Space
station technology has had a long history. The current design is
primarily the work of western and Russian scientists. Collaboration
between nations has continued. For example, between 1995 and 1998, U.S.
space shuttles docked with the Russian MIR space station nine
times. In fact, the day after the Columbia disaster, a Russian
Soyuz shuttle took off to travel to the ISS to restock the station.
One
can only wonder why there was a U.S. shuttle mission that did not (and,
as it turns out, could not) visit the space station while at the same
time a Russian craft was ready, willing and able to perform what was
originally a primary function of the U.S. shuttle program. Today we must
ask for clarification of the shuttle’s purpose. In fact, we must ask
for a review of the overall plan for space exploration.
There
are many questions in our minds about the shuttle’s purpose:
-
Is the shuttle a
means of transportation designed to routinely carry civilians or
military personnel on critical mission? Who will ride the shuttle?
-
Is it an orbiting
laboratory or a transportation system?
-
What kind of
experiments should be funded? To what purpose? Why can’t the
experiments be done in the space station?
-
How frequently is
the shuttle needed? When first conceived, it was thought that
shuttles would fly to the space station once a week.
-
What size
payloads does it need to deliver?
Risk
and Probability of Failure
Lack
of clarity of mission and purpose has made risk analysis very difficult.
Time magazine this week says: “In a flying machine with more
than 2.5 million parts, even a 99.99 percent reliability level would
still leave 2,500 things to go wrong.”
I’ve
seen other references to "four 9s reliability" attributed to
the shuttle. These references are, of course, out of context and can’t
possibility refer to the reliability of the shuttle as a system. Some
people refer to four 9s and say it’s the best you can expect. This
attitude contributes to an acceptance of an indeterminate level of risk,
which can lead to the acceptance of inadequate systems and/or a lack of
emphasis on risk mitigation.
What
is the risk of catastrophic failure of the shuttle? According to an
article published in the Boston Globe, referencing references
the Congressional Record, the expected failure rate of the existing
shuttle design is 1 in every 250 voyages. A recent Newsweek
article says: “There was a time 17 years ago, before the space shuttle
Challenger blew up, when NASA claimed the likelihood of a
catastrophic accident was 1 in 100,000. Since that time, NASA has
increased the risk to 1 in 148.” However, in a Sunday New York
Times article, a different number is given: 1 in 78 before the Challenger
disaster; 1 in 483 according to recent NASA statements.
A
New York Times article quotes Theodore Postal, an MIT professor
and expert on complex spaces systems, saying: “The most reliable
systems or booster, outside the human space program, have about one
failure in 50 tries, for a 98 percent success rate.”
According
to the Globe article, a second-generation spacecraft has been
under development. The second-generation system has a reported design
objective of a failure rate of 1 in 10,000 missions. According to the Globe,
$1.3 billion has been spent on the second generation program to date,
but it is stalled at the present time with decisions regarding how to
proceed delayed until at least 2005. It’s pretty clear that the
failure rate NASA has expected over the years has itself been unclear
and has changed along with the mission.
Another
Globe article on Sunday entitled “Officials had warned of
NASA safety issues” quotes an expert opinion that I believe is shared
by many. The expert is Howard McCurdy, a social scientist at American
University and author of six books
about NASA: “The shuttle has always been a high-risk venture,”
McCurdy writes. “You worry that it finally caught up with us: the
fundamental design. There are a lot of components on the shuttle for
which a component loss can very quickly cause a vehicle loss.”
Professor
Postol of MIT is quoted further in the Times article: “You
have a system where the consequence of what appear to be relatively
minor failures quickly propagate into catastrophe. It looks like the
system is roughly as reliable as most other launch systems of roughly
comparable complexity.”
Professor
Postol added: “You’re riding a stick of dynamite into space. We know
how to do that, but sticks of dynamite can explode.”
These
expert opinions and characterizations, coupled with the failure rates
associated with the shuttle, add up to a design not safe enough to
reliably transport people. The shuttle, many believe, is still an
experimental mode of transportation.
Is
Risk “Normalized” at NASA?
In
1996 Diane Vaughan, a professor of sociology at Boston
College, published an analysis of the Challenger
disaster. Her book, entitled The Challenger Launch Decision, Risky
Technology, Culture and Deviance at NASA, concludes that the NASA
culture allowed and even facilitated the acceptance of a known flaw in
the rocket boosters as an acceptable risk when many of the engineers
involved in the design of the boosters recommended against the continued
use of the design.
The
night before the fateful launch of Challenger, engineers from
the contractor Morton Thiokol, to quote the book, “argued against
launching on the grounds that the O-rings were a threat to flight
safety. NASA managers decided to proceed.” The book points out that
the O-ring problem was well-known for years; the relationship between
temperature and failure had been established with low temperatures
tending to cause failure.
The
design was thought by many experts to be seriously flawed. Vaughan's
research, which is extensive and completed over nine years, finds that
at NASA, risk tends to be normalized. That is, when a risk is
identified, the culture at NASA tends to work the risk into normal
operational standards. In other words, NASA does not always work to
redesign risky components. Whether this is an acceptable practice or not
depends, in part, on the specific nature of the mission.
It
appears that the NASA culture may not have changed enough since 1986.
Today we don’t know what caused the Columbia
disaster. However, the heat shield made of
ceramic tiles is a chief suspect. They have been a known problem since
the first launch in 1981, also a Columbia launch.
During the launch of STS 1, Columbia's and
the shuttle’s first launch, reports say that 12 tiles fell off the
shuttle. If tile failure or damage turns out to be the cause of this
latest disaster, history will have been allowed to repeat itself.
Whether the disaster was caused by tile failure or not, the risks
associated with travel on the shuttle appear to be far higher than an
acceptable risk, given the shuttle’s almost routine usage.
This
concern is heightened by the apparent lack of a rigorous response to the
insulation that fell off Columbia on
lift-off. During the flight of Columbia, no attempt was made to
ensure that no damage was done by the falling insulation. In the first
few days after the disaster, NASA explained this by saying there was
nothing that could be done anyway.
No
space walk to examine or repair anything was possible, no docking with
the space station, either. If the ship was doomed at that point, nothing
could be done. This seems incredible given what was at stake, the nature
of the mission, and the dangers and known problems associated with the
shuttle.
Clear
Goals and Objectives are Needed
The
national agenda has moved away from the space program. Many citizens and
government leaders feel it is wasteful and not critical to our
country’s well being. Explorers usually have a physical destination:
North America, the moon, Mars or beyond.
It’s been 30 years since anyone has traveled more than 300 miles from
Earth. In hindsight, if our goals were to orbit the Earth, perform
experiments and launch and use telescopes to visually explore space, we
would have designed a different system than the one we have.
While
it makes us feel good to say this tragedy will not deter us from space
exploration, we must stop now and reconsider every aspect of our space
program. Our objectives must be clarified in the context of the broader
national agenda. Once we know exactly what is at stake, determination of
acceptable risk targets can be made (the current ones seem to be too
unfavorable by one or two orders of magnitude). Knowing the mission,
purpose and acceptable risk will enable our scientists and engineers to
design a system that will eventually perform as required.
Lessons
Learned
NASA’s
track record, given that it has undertaken some of the most complex and
difficult projects and longest programs in history, is superb. However,
we can see in hindsight that difficult, risky tasks require crystal
clear statements of purpose and near perfect alignment within the team.
At NASA, the culture of pride and dedication creates teams that do great
work day-in and day-out.
What
has been lacking is continuous reinforcement of the specific mission
statement so that it is clear to all. Most IT projects have missions
that are mundane compared to those of NASA. On such workaday projects,
it is very easy to lose focus and forget what the project is about.
Project teams very often stop questioning and stop seeking
clarification. Leadership forgets to remind the team what the mission
is. This can lead to project failure.
There
are some lessons we can take away from the space shuttle’s
difficulties:
-
Lesson
No.1:
Never stop questioning; be certain that you know how your work
fits into the overall mission and purpose of your project. Read
the scope documents every day. If no scope documents exist, get
them built or leave the team. If they are unclear, get them
clarified.
-
Lesson
No. 2:
Without a clear understanding of the purpose and mission of our
work, we can’t know what risk probability is acceptable. This is
incredibly dangerous! Every program and project has inherent risk.
The purpose of risk mitigation is to reduce the probability of the
risk to an acceptable level. We must know what level of risk is
acceptable in order to know how much resource should be spent in
risk mitigation.
-
Lesson
No. 3:
In very risky work, it is not helpful to continuously say, “risk
is unavoidable.” There is a huge difference between rolling the
dice and mitigating risk. If there is risk that can’t be
mitigated in some way, the project plan must be questioned. For IT
projects, this might mean (for example) admitting that the
timetable is not feasible and negotiating for more time rather
than leading the team on a death march.
-
Lesson
No. 4:
Question the impact of changes in requirements on design. The
Space Shuttle was designed 30 years ago for a purpose that
has changed. Is the design suited to today’s purpose? The same
question needs to be asked throughout the life of any program or
project.
-
Lesson
No. 5:
Know the culture of your company. How are risk and its mitigation
usually managed? For example, if a risk is discovered, does the
organization tend to shoot the messenger? Is there a tendency to
rationalize issues such as risks or does the organization meet
them head-on? Understanding the culture you operate in will help
you manage risks and other project impacts.
Copyright © Gantthead, 2003
Go
to Current Events articles index