|
By
Randall
W. Rice
Testing
has been describes as an art (The
Art of Software Testing by Glenford
Myers), a craft (The
Craft of Software Testing by Brian Marick), and a
process (Effective
Methods for Software Testing by William E. Perry),
but I would like to examine another aspect of testing, that is, the Science of Software Testing.
A
Brief Background
I
graduated college with a Bachelor of Science degree as a Math major,
which was accidental. I started out as an electrical engineering major
but changed late in my Junior year when I discovered that EE really
wasn't as appealing to me as I had originally thought it would be. In
my schooling, I was trained in the traditional scientific method, which
affects how I see things.
The
Traditional Scientific Method
The
traditional scientific method has been the predominant method for
people to observe and understand the operation of world and the
universe. In recent years, some scientists have developed methods that
are less rigorous, but the traditional method is what I will use as the
basis for this article. The steps in the traditional method are:
1.
Observe some aspect of the universe.
2.
Invent a theory that is consistent with what you have observed.
3. Use
the theory to make predictions.
4.
Test those predictions by experiments or further observations.
5.
Modify the theory in the light of your results.
6.
Go to step 3.
There
are differing views among scientists today as to what constitutes a
theory, a hypothesis and a fact. How someone defines these terms can
greatly affect their view of science. To fully expound on these
differing views of the scientific method and how the terms are defined
is beyond the scope of this article. It is important, however, to
understand there is often a level of bias since people will hold
certain definitions that are consistent with their beliefs of how the
world operates. This is circular reasoning, because if I am trying to
explain how and why something happens, it will be based in some degree
of a framework that I believe in already. Therefore, one of the great
challenges of science is to maintain objectivity.
For the
purpose of defining the working definitions of this article I will
outline the following terms, which I do not propose to be perfect or
accepted by everyone. The source is Webster's Revised Unabridged
Dictionary, © 1996, 1998 MICRA, Inc.
Observation
- "(a) The act of recognizing and noting some fact or occurrence in
nature, as an aurora, a corona, or the structure of an animal. (b)
Specifically, the act of measuring, with suitable instruments, some
magnitude, as the time of an occultation, with a clock; the right
ascension of a star, with a transit instrument and clock; the sun's
altitude, or the distance of the moon from a star, with a sextant; the
temperature, with a thermometer, etc. (c) The information so acquired.
Note:
When a phenomenon is scrutinized as it occurs in nature, the act is
termed an observation. When the conditions under which the phenomenon
occurs are artificial, or arranged beforehand by the observer, the
process is called an experiment. Experiment includes observation."
Experiment
- "1. A trial or special observation, made to confirm or disprove
something doubtful; esp., one under conditions determined by the
experimenter; an act or operation undertaken in order to discover some
unknown principle or effect, or to test, establish, or illustrate some
suggest or known truth; practical test; poof."
Hypothesis
- "1. A supposition; a proposition or principle which is supposed or
taken for granted, in order to draw a conclusion or inference for proof
of the point in question; something not proved, but assumed for the
purpose of argument, or to account for a fact or an occurrence; as, the
hypothesis that head winds detain an overdue steamer.
2.
(Natural Science) A tentative theory or supposition provisionally
adopted to explain certain facts, and to guide in the investigation of
others; hence, frequently called a working hypothesis."
Assumption
- "The thing supposed; a postulate, or proposition assumed; a
supposition."
Theory
- "1. A doctrine, or scheme of things, which terminates in speculation
or contemplation, without a view to practice; hypothesis; speculation.
2. An
exposition of the general or abstract principles of any science; as,
the theory of music.
3. The
science, as distinguished from the art; as, the theory and practice of
medicine.
4. The
philosophical explanation of phenomena, either physical or moral; as,
Lavoisier's theory of combustion; Adam Smith's theory of moral
sentiments."
Fact
- "2. An effect produced or achieved; anything done or that comes to
pass; an act; an event; a circumstance.
3.
Reality; actuality; truth; as, he, in fact, excelled all the rest; the
fact is, he was beaten.
4. The
assertion or statement of a thing done or existing; sometimes, even
when false, improperly put, by a transfer of meaning, for the thing
done, or supposed to be done; a thing supposed or asserted to be done;
as, history abounds with false facts."
Law
- "5. In philosophy and physics: A rule of being, operation, or change,
so certain and constant that it is conceived of as imposed by the will
of God or by some controlling authority; as, the law of gravitation;
the laws of motion; the law heredity; the laws of thought; the laws of
cause and effect; law of self-preservation.
6. In
mathematics: The rule according to which anything, as the change of
value of a variable, or the value of the terms of a series, proceeds;
mode or order of sequence.
7. In
arts, works, games, etc.: The rules of construction, or of procedure,
conforming to the conditions of success; a principle, maxim; or usage;
as, the laws of poetry, of architecture, of courtesy, or of whist."
The
Science of Software Testing
Some
testing methods are performed at a "junk science" level, which is often
based on small sample sizes and poorly controlled or documented
experiments. In software development, this is usually called the demo
and is performed by executing the software with constructed test cases
that are known in advance to work.
Rigorous
testing, on the other hand, is based on observing the difference
between the actual behavior and the expected behavior of the software
to be tested (the hypothesis). Testing should be seen as both
verification (testing against specifications) and validation (testing
against the real world). Both verification and validation are needed
because specifications aren't perfect.
Aspects
of the Science of Software Testing
Pre-definition of Expected Results
Pre-definition
of expected results is similar to the scientist that predicts the
outcome of an experiment before it is performed by proposing a
hypothesis. There is something about predicting the outcome in advance
that adds a degree of rigor to the findings. If you wait until the
experiment is over and try to interpret the results in light of your
understanding and observation, it is easy to convince yourself and
others that what you observed was a validation of your hypothesis after
all, all things considered. When the actual results of the experiment
do not match your pre-defined expected results, the discrepancy should
lead you to question the experiment, the hypothesis, or both.
Observation
Without
observation, it is impossible to tell the outcome of a test or an
experiment. Although this makes sense, it is tempting to design tests
and experiments that are difficult if not impossible to observe. We may
want to prove or test something, but real-world constraints prevent
constructing an accurate experiment. That's why you can't test
everything – not everything is testable.
Repeatability
In
science, an experiment may be performed thousands of times before a
trend can be established. The first time a result is observed the
scientist isn't sure if the result was due to an unknown aspect of the
experiment or a predictable behavior of the subject. To provide a
confirmation of the experiment, it may need to be repeated many times.
Likewise, in testing, when a defect is observed, the first test may be
seen as the indicator and follow-up tests may be seen as the
confirmation. After a defect has been fixed, the test must be repeated
exactly as before to ensure the fix works. Although this sounds simple,
it may be very difficult in actual practice to get the second test
environment set up exactly as the first test environment.
Construction of the Experiment
In
scientific research, experiments are carefully planned and controlled.
The laboratory environment grew out of the need to prove conditions
during an experiment and to repeat the experiment. In this analogy of
testing as science, what many people do is perform experiments in their
kitchen, not in a controlled laboratory environment. This is a critical
lapse, as the test environment can impact many external and internal
factors of the test which could very well lead to false test results. I
would venture to say that no other discipline could get away with the
lax methods used in many software tests, especially where environments
are concerned.
In
testing, carefully constructed and controlled environments are
sometimes needed to get the level of test reliability that is
appropriate to the risk. Your test is only as good as the test
environment!
Performing the Experiment
The
performance of the experiment is an exercise in carefully following the
design of the experiment. The research scientist doesn't improvise
unless they are doing work apart from the plan. Granted, some of the
great scientific discoveries have occurred because the researcher tried
something other than the planned experiment, but these are exceptions
rather than the rule. In testing it is important to stick to your test
plan. It's alright to test other cases, just be sure you document what
you did so you can repeat the test if you have to.
Having a Control Group
In
scientific experiments control groups are used as a baseline for
comparison of results. For example, a researcher might test a trial
medication on one group of people while giving a placebo to another
group. The people in the experiment do not know if they have been given
the real medication or the placebo. This "double blind" research helps
to counteract subconscious biases. In testing we also need a baseline
of correct system behavior as a baseline. Interpreting Results and
Drawing Conclusions One of the great challenges of science is to
observe the tests of a hypothesis and make a reasoned interpretation of
the results. The challenge of doing this task is maintaining
objectivity and having the courage to report what you actually observed
as opposed what someone else expected to see. Gee, that sounds
familiar. In testing, you can only speak to what you have observed. It
is unrealistic and unwise to predict results from what could be seen
from tests not performed.
Modifying the Hypothesis
In
testing, the main hypothesis is often that the system should work under
given conditions. However, there is another opposing hypothesis that
although the system should work, there are defects in it that need to
be found. The second hypothesis is the safest one.
When
your test results prove the second hypothesis is true, then a
fundamental shift starts to occur in the minds and attitudes of those
who held the first hypothesis. This is when many people instead of
modifying the hypothesis try to discredit or invalidate the experiment.
This often takes the form of blaming the testers for the defects, which
is like blaming a research scientist for the results of a correctly
performed experiment.
However,
let's say for a moment that people reach agreement that the first
hypothesis was wrong and that the software does have defects and needs
to be fixed. This may imply that the software will be delivered late
and other people will be held accountable. Although these consequences
may occur, people need to face reality and correct the problems instead
of focusing on their own agendas.
Perhaps
this aspect of testing is most closely aligned to the science we see
practiced today. If the research confirms the hypothesis, we hear about
it. If the research supports a contrary hypothesis, especially one that
may be politically incorrect, those findings may never be published.
The
Longer View
The
reason scientific research is performed is to explain the way
observable nature behaves. I would also add that a great benefit of
that knowledge is to improve things we currently do. It has been said
that the thing that distinguished Thomas Edison from other inventors
was that he always had a keen sense of how science could help people by
improving their lives. Edison also had a good sense of business as
well. He knew when to stop research and build the project.
In
testing, the initial goal is to find defects. However, that is a
short-sighted view and fails to make the best of the resources that
have been expended on creating and fixing the defect. The longer view
of testing is to build ways to prevent similar problems in the future
by improving the processes used to build the product.
I
believe we are far from seeing software testing performed as a
scientific process, but it gives us something to think about and relate
to, especially when it comes to evaluating the rigor and reliability of
test results.
Conclusion
There
are many points in common between software testing and traditional
science. In fact, software testing may be closer to a science than to
anything else we can relate. These similarities can be helpful in
understanding testing and explaining testing to others. The
similarities also provide a benchmark of how rigorous a process we are
using in defining and performing testing processes. Although not every
test will need to be performed at the rigor of scientific research,
some tests need to be performed at that level because of high risk.
All materials on this site
copyright 1996 - 2008, Rice Consulting Services, Inc.
Rice
Consulting Services, Inc.
P.O. Box 892003
Oklahoma City, OK 73189
405-691-8075
"Leaders
are made, they are not born. They are made by hard effort,
which is the price which all of us must pay to achieve any goal that is
worthwhile." -- Vince Lombardi

This site best
viewed with the Mozilla Firefox
browser!
|