Coining a New Term

I'm coining a new term today, "grapefruit juice bugs."

My inspiration for this term is a blog post in the New York Times that David Pogue wrote. I was fascinated by the post and it got me to thinking about a particular kind of bugs in software that are more common than most people may realize. You could say that these bugs are surprisingly common. In fact, if you wanted to be more precise, you could even say that this term applies to a specific type of "surprisingly common type of surprising bugs." Let me explain.

There's something about the chemical makeup of grapefruit juice that makes it interact with our biology and a large number of different drugs in ways which result in dangerous conditions. For example, certain drugs lose their effectiveness dramatically when interacting with grapefruit juice which can have life-threatening consequences. Other times, the interactions with grapefruit juice can dramatically increase a drug's potency. This can result in "safe doses" becoming very unsafe.

Grapefruit Is a Culprit in More Drug Reactions

The 42-year-old was barely responding when her husband brought her to the emergency room. Her heart rate was slowing, and her blood pressure was falling. Doctors had to insert a breathing tube, and then a pacemaker, to revive her.

They were mystified: The patient’s husband said she suffered from migraines and was taking a blood pressure drug called verapamil to help prevent the headaches. But blood tests showed she had an alarming amount of the drug in her system, five times the safe level.

Did she overdose? Was she trying to commit suicide? It was only after she recovered that doctors were able to piece the story together.

“The culprit was grapefruit juice,” said Dr. Unni Pillai, a nephrologist in St. Louis, Mo. ...

The previous week, she had been subsisting mainly on grapefruit juice. Then she took verapamil, one of dozens of drugs whose potency is dramatically increased if taken with grapefruit. In her case, the interaction was life-threatening.

Last month, Dr. David Bailey, a Canadian researcher who first described this interaction more than two decades ago, released an updated list of medications affected by grapefruit. There are now 85 such drugs on the market, he noted, including common cholesterol-lowering drugs, new anticancer agents, and some synthetic opiates and psychiatric drugs, as well as certain immunosuppressant medications taken by organ transplant patients, some AIDS medications, and some birth control pills and estrogen treatments. ... Under normal circumstances, the drugs are metabolized in the gastrointestinal tract, and relatively little is absorbed, because an enzyme in the gut called CYP3A4 deactivates them. But grapefruit contains natural chemicals called furanocoumarins, that inhibit the enzyme, and without it the gut absorbs much more of a drug and blood levels rise dramatically.

For example, someone taking simvastatin (brand name Zocor) who also drinks a small 200-milliliter, or 6.7 ounces, glass of grapefruit juice once a day for three days could see blood levels of the drug triple, increasing the risk for rhabdomyolysis, a breakdown of muscle that can cause kidney damage.


So what do interactions between grapefruit juice and drugs have to do with software testing?

Like grapefruit juice's impact on prescription drugs, software testing involves critical interactions between different parts of the system. And risks exist when these different parts interact with one another. This is true whether you're talking about "large parts" interact in System Testing or "small parts" interact in Unit Testing.

Interactions between things are a very rich source of bugs in software. As anyone who has heard the infernal phrase "works on my machine" can tell you, software features and functions often work perfectly fine in many usage scenarios, hardware and software configurations , etc. - only to fail to work in ever-so-slightly different situations.


The difference between plain old every-day "Dual-Mode Faults" and "Grapefruit Juice Bugs"

A dual-mode fault occurs whenever two test inputs must both be present to trigger a defect. Most software testers start encountering them quite frequently within days of starting their jobs. Some examples:

  • This "buy" button works fine. Except when the customer is a "new user." (First, action = "click on the buy button" and Second, customer = "new user")

  • Transaction prices for share purchases are calculated correctly. Except when denominated in Japanese Yen. (First, Action = "sell shares" and Second, Currency = "Japanese Yen")

Like grapefruit juice's impact on prescription drugs, software testing involves critical interactions between different parts of the system. And risks exist when these different parts interact with one another. This is true whether you're talking about "large parts" interact in System Testing or "small parts" interact in Unit Testing.

While all grapefruit juice bugs are dual-mode faults, not all dual-mode faults are Grapefruit Juice Bugs:

  • Grapefruit juice bugs have got to have a little of the element of surprise in them. When you explain them to a developer, their first reaction should be "Huh? How is that even possible?" or at least "Hmmm... That's odd. Let me investigate."

  • Anything along the lines of "This feature usually works, except in IE6, when..." is almost definitely not a grapefruit juice bug. Problematic interactions with IE6 are an incredibly common type of dual-mode fault, not a surprising one.

Whenever you hear "works on my machine" replies to your bug reports, and it takes a while for the issue to be replicated, odds are pretty good that a grapefruit juice bug might be involved.

Here's an example of an especially surprising grapefruit juice bug. This excerpt from Apple's online help files that the company posted after users of the original iPad complained about problems with Wi-Fi connectivity. Certain screen brightness settings were causing problems with the Wi-Fi signals. I'm not even to begin to guess how one would have anything to do with the other.


How to identify grapefruit juice bugs during your testing?

What is a tester to do when faced with more possible potential grapefruit juice bugs than he can handle using traditional methods?

If you're a software tester trying to do your best to determine whether a feature or function in your System Under Test will work "on everyone's machine," you've got a nightmare on your hands . Really nasty combinatorial explosions arise when you consider all of the possible combinations that would be required to test multiple hardware options, multiple software options, multiple usage scenarios, multiple test data inputs (and multiple combinations of the test data itself), multiple ways in which users enter data, and all of the rest of the "stuff that could vary" when people use your application. If you take the time to think expansively about the possible variations in a medium-sized applications, Quadrillions of possible tests often result.

While not eating grapefruit and not drinking grapefruit juice might be wise if you are taking drugs, there is rarely, if ever, such an easy method for eliminating the possibility of negative results due to software interactions. Refusing to support IE 6 in order to avoid the disproportionate number of grapefruit juice-like problematic interactions associated with IE6 would be as close as you could come in the world of software.

Design of Experiments-based test design methods can help testers come to grips with this challenge. Orthogonal array software testing (often referred to as OATS or simply OA testing) is a test design strategy that allows us to efficiently detect bugs created by interactions within the system. Orthogonal array software testing is based on the principles of multifactor designed experiments as first explored by Sir RA Fisher.

Design of Experiments-based test design methods are very-closely related to pairwise testing (AKA allpairs testing, all pairs testing, and pairwise-testing). Any of these test design strategies will allow a software tester to quickly generate a set of tests that includes tests for every single pair of test inputs.

This approach to test design often has multiple advantages, including faster test creation, more varied test scenarios, 100% coverage of all potential dual-mode faults (including hard-to-predict grape-fruit juice bugs), and often a smaller resulting set of tests that will be quicker to execute. Having said that, it is by no means a magical silver bullet. This approach to test design requires test designers with above average analytical abilities to identify the appropriate Parameters and Values for their system under test; this is sometimes easier said than done because it requires a new mindset from test designers.

Software testers can take solace that the challenges of software testing, while significant, are simple when compared to trying to understand the effects of drug interactions in people.

Combinatorial testing can look at bugs created by the interaction between multiple (3, 4, 5, 6...) variables. So if there was a bug that didn't get triggered just by using Chrome on Windows but it would get triggered if you also tried to replace an existing photo in your profile with a new profile photo into your profile (test idea number 3), then pairwise testing might not catch it. Pairwise test design would create a set of tests that would include at least one test for each of these pairs:

  • Chrome & Windows and

  • Chrome & replace photo and

  • Windows & replace photo, but...

A set of pairwise might not fail to test for the specific combination of all three of those test inputs in the same test. With the use of combinatorial test design approaches, you could create test plans with 100% coverage for 3 way interactions and be sure that all 3-way interactions or 4-way interactions are covered. When you create sets of 3-way tests, 4-way tests, 5-way tests, and 6-way tests though, you'll quickly discover that the number of tests required starts to balloon.

Hexawise allows you to create test plans with the coverage interactions you desire. This allows you to create sets of tests from 2-way up all the way up to phenomenally-thorough 6-way sets of tests. In fact, it even lets you generate clever sets of risk-based tests that will, say, prioritize comprehensive 4-way coverage on 4 sets of Parameter Values while ensuring only pairwise coverage of the other, lower-priority, interactions in your system under tests. Hexawise also lets you create mixed strength test plans so if you have certain factors that you are very concerned about and want to provide coverage for more possible interactions you can set the interaction levels for those at a higher level.


Related: Hexawise Tip: Using Value Expansions and Value Pairs to Handle Dependent Values - Maximize Test Coverage Efficiency And Minimize the Number of Tests Needed - How to Model and Test CRUD Functionality - 25 Great Quotes for Software Testers

By: Justin Hunter on Feb 11, 2014

Categories: Bugs, Combinatorial Testing, Design of Experiments, Multi-variate Testing, Multi-variate Testing, Pairwise Software Testing, Software Testing, Testing Strategies

Many teams are trying to generate unusually powerful and varied sets of software tests by using Design of Experiments-based methods to generate many or most of their tests. The two most popular software test design methods are orthogonal array testing and pairwise testing. This article describes how these two approaches are similar but different and suggests that in most cases, pairwise testing is preferable.

Before advancing, it may be worth pointing out that Orthogonal Array Testing is also known as OA or OATS. Similarly, pairwise testing is sometimes referred to as all pairs testing, allpairs testing, pair testing, pair-wise testing, or simply 2-way testing. The difference between these two very similar approaches of pairwise vs. orthogonal array is that orthogonal array-based solutions require the same coverage goal that pairwise solutions do (e.g., that every pair of inputs is tested at least once) plus an additional hurdle/characteristic, that there be a uniform distribution throughout the domain.

I have studied the question of how can software testing inputs be combined most efficiently and effectively pretty steadily for the last 7 years. I started by searching the web for "Design of Experiments" and "software testing" and found references to Dr. Madhav Phadke (who, by coincidence, turns out was a former student of my father).

  • I discovered that Dr. Phadke had designed RDExpert which, although it had been primarily created to help with Research & Design projects in manufacturing settings, could also be used to select small sets of powerful test sets in software testing projects, using the Orthogonal Array-based test selection criteria.

  • I used RDExpert to create test sets (and compared those test sets against sets of tests that had been selected manually by software testers)

  • I gathered results by asking one tester to execute the manually selected tests and another tester to execute the the Orthogonal Array-based tests; the OA-based tests dramatically outperformed the manually-selected ones in terms of defects found per tester hour and defexts found overall.

So, in short, I had confirmed to my satisfaction that an OA-based test data combination strategy was far more effective than manually selecting combinations for the kinds of projects I was working on, but I was curious if other techniques worked better.


After more study I have concluded that:

  • Pairwise is more efficient and effective than orthogonal arrays for software testing.

  • Orthogonal Arrays are more efficient and effective for manufacturing, and agriculture, and advertising, and many other settings.


And we have built Hexawise as a software tool to help software producers test their software, based on what I have learned from my experience. We take full advantage of the greatly increased efficiency and effectiveness of letting testers to determine what needs to be tested and software algorithms to quickly create comprehensive test plans that provide more coverage with dramatically fewer tests.

But we also go well beyond this to create a software as a service solution that aids the software testing team with many huge advantages such as: automatically generating Expected Results in test scripts, automated importing of data from Excel or mind maps, exporting tests into other tools, preventing impossible to test for values from appearing together, and much more.


Why is a pairwise testing strategy better than an orthogonal array strategy?

  • Pairwise testing almost always requires fewer tests than orthogonal array-based solutions (it is possible, in some situations, for them to have an equal number of tests).

  • Remember, the reason that orthogonal array-based solutions require more tests than a pairwise solution to reach the coverage goal of testing all pairs of test conditions together in at least one test is the additional hurdle/characteristic that orthogonal array testing has, e.g., that there be a uniform distribution throughout the domain.

  • The "cost" of the extra tests (AKA experiments) is worth paying in many settings outside of the software testing industry because the results are non-binary in those tests. Someone seeking a desired darkness and gloss and luminosity and luster for a particular shade of green in the processing of film, for example, would benefit from with the information obtained from the added information gathered from orthogonal arrays.

  • In software testing, however, the added costs imposed by the the extra tests are not worth it. You're generally not seeking some ideal point in a continuum; you're looking to see what two specific pieces of data will trigger a defect when they appear in the same transaction. To identify that binary approach most efficiently and effectively, what you want is a pairwise solution (with fewer tests), not a longer list of orthogonal array-based tests.


Let me also add these points.

  • First, unlike some of my other views on combinatorial test design, my opinion on this narrow subject is not based on multiple empirical studies; it is based on (a) the reasoning I laid out above, and (b) a dozen or so conversations I've had with PhD's who specialize in the intersection of "Design of Experiments" and software test design, and (c) anecdotal evidence from using both methods.

  • Secondly, to my knowledge,very few, if any, studies have gathered empirical data showing benefits of pairwise solutions vs. orthogonal array-based solutions in software testing scenarios.

  • Thirdly, I strongly suspect that if you asked Dr. Phadke, he would give you his reasons for why orthogonal array-based solutions are appropriate (and even preferable) to pairwise test case selection methods for certain kinds of software projects. I have a huge amount of respect for both him and his son.


Time doesn't allow me to get into this last point much now, but "mixed strength" tests are another even more powerful test design approach for you to be aware of as well. With mixed strength testing solutions, the test designer is able to select a default coverage strength for the entire plan (e.g., pairwise / AKA 2-way coverage) and, in the same set of tests, select certain high priority values to receive higher coverage strength (e.g., 4-way coverage strength selected for each "Credit Rating" and "Income" and "Loan Amount" and "Loan to Value Ratio" would give you a palm that achieved pairwise coverage for everything in the plan plus comprehensive coverage for every imaginable combination of values from those four high priority parameters. This approach allows you to focus on risk-based testing considerations.


Sorry if I got a bit long-winded. It's a topic I'm passionate about.

Originally posted on Stack Exchange, Additional note added after the first 3 comments were submitted:

@Hannibal, @Peter K., and @MichaelF, Thanks for your comments! If you'd like to read more about this stuff, I recommend the multiple links available through this "bundle of links" about pairwise testing and combinatorial testing. In particular, Michael Bolton's article on pairwise testing is directly relevant and very clearly written. It is one of the few introductory articles around that accurately describes the difference between orthogonal array-based solutions and pairwise solutions. If I remember correctly though, the example Michael uses is a rare exception to the rule; the OA solution has the same number of tests as an optimal pairwise solution does.

Related: The Empirical Evidence for Using Pairwise and Combinatorial Software Testing - 3 Strategies to Maximize Effectiveness of Your Tests - Hexawise TV

More than 100 Fortune 500 firms use Hexawise to design their software tests. While large companies pay six figures per year for enterprise licenses, Hexawise is available for free to schools, open source projects, other non-profits, and teams of up to 5 users from any kind of company. Sign up for your Hexawise account.

By: John Hunter and Justin Hunter on Jun 11, 2013

Categories: Combinatorial Testing, Design of Experiments, Efficiency, Multi-variate Testing, Pairwise Software Testing, Software Testing, Testing Strategies, Experimenting


Design of Experiments in Software Testing - Pairwise and Combinatorial - Hexawise

Justin Hunter @Hexawise:


Removing inefficiency is good, sure, but it is not why Design of Experiments is so friggin' powerful. Saying DoE is interesting to know about because it can help identify and remove specific inefficiencies is a bit like saying Canada is a good country to visit because you can sometimes find a good cup of coffee there. To my mind, saying DoE is primarily about removing inefficiency misses the main point.

Design of Experiments is so powerful because it allows practitioners to predictably, systematically, and consistently find out more useful, actionable information in much less time than they would otherwise take to obtain this information (if they could find it at all with their less-structured approaches).

In manufacturing circles (e.g., when engineers produce new prototypes), DoE's ability to do this is no longer questioned. This is because leaders like George Box taught people in industry how to apply DoE and they gathered conclusive evidence that DoE allowed manufacturers to learn much faster through techniques like applying factorial designs. Box and other DoE experts (Taguchi, Montgomery, my dad, etc.) dealt with skeptical manufacturing engineers for four decades by showing them the facts and using DoE on the skeptics' own projects right under their noses. The evidence that DoE allows manufacturers to learn much faster (about a wide variety of learning goals) than the other methods they used prior to 1960 is incontrovertible.

In 2010, in the gradually maturing field of software testing, Design of Experiments-based methods of test case design has not caught on much at all yet. As an industry, it's adoption of DoE-based approaches is roughly where manufacturing was in 1960. Most software testers, even very good ones, don't know anything at all about how DoE can help them. Many other software testers have heard a bit about pairwise but mistakenly think pairwise and related, structured, DoE-based, test case selection method can't help them.

Even some of the best testers in the world who have written some of the most clearly-written and well-reasoned articles about pairwise approaches do not (in my view) seem to fully-understand: (a) how powerful the benefits are, (b) how often the approach can be applied / in how many diverse kinds of testing situations they can be utilized, and/or (c) how consistently the efficiency and effectiveness benefits are be generated when they are used properly. DoE methods, including pairwise and n-wise and mixed strength automatic test condition generation (made possible by tools like our Hexawise tool and also, to a great extent by James Bach's free AllPairs tool) allow software testers to learn much faster about critically important questions like: (1) where are the bugs?, (2) what is causing the bugs to appear?, (3) am I confident I have efficiently tested for a huge range of combinations of values in the System Under Test that might trigger defects? (4) am I succeeding in avoiding redundant repetition of steps in many test cases?, (5) how many bugs would be likely to find if we were to continue to run the next 100 tests?, etc.

In summary, the reason for the existence of Design of Experiments methods (whether we're talking about their applicability to testing software as efficiently and effectively as possible, or DoE methods' applicability to a huge variety of other objectives) - and, for that matter, the reason that they have been continuously refined and improved for 40+ years - is that DoE methods consistently and predictably allow users to learn actionable results as quickly as possible.


Related: Maximize Test Coverage Efficiency And Minimize the Number of Tests Needed - Pairwise and Combinatorial Software Testing in Agile Projects - Video Highlight Reel of Hexawise

By: Justin Hunter on Feb 11, 2013

Categories: Design of Experiments, Multi-variate Testing, Recommended Tool, Testing Strategies

Combinatorial Software Test Design - Beyond Pairwise Testing


I put this together to explain combinatorial software test design methods in an accessible manner. I hope you enjoy it and that, if you do, that you'll consider trying to create test cases for your next testing project (whether you choose our Hexawise test case generator or some other test design tool).


Where I'm Coming From

As those of you know who read my posts, read my articles, and/or have attended my testing conference presentations, I am a passionate proponent of these approaches to software test design that maximize variation from test case to test case and minimize repetition. It's not much of an exaggeration to say I hardly write or talk publicly about any other software testing-related topics. My own consistent experiences and formal studies indicate that pairwise, orthogonal array-based, and combinatorial test design approaches often lead to a doubling of tester productivity (as measured in defects found per tester hour) as compared to the far more prevalent practice in the software testing industry of selecting and documenting test cases by hand. How is it possible that this approach generates such a dramatic increase in productivity? What is so different between the manually-selected test cases and the pair-wise or combinatorial testing cases? Why isn't this test design technique far more broadly adopted than it is?


A Common Challenge to Understanding: Complicated, Wonky Explanation

My suspicion is that a significant reason that combinatorial software testing methods are not much more widely adopted is that many of the articles describing it are simply too complex and/or too abstract for many testers to understand and apply. Such articles say things like:

A. Mathematical Model


A pairwise test suite is a t-way interaction test suite where t = 2. A t-way interaction test suite is a mathematical structure, called a covering array.

Definition 1 A covering array, CA(N; t, k, |v|), is an N × k array from a set, v, of values (symbols) such that every N × t subarray contains all tuples of size t (t-tuples) from the |v| values at least once [8].

The strength of a covering array is t, which defines, for example, 2-way (pairwise) or 3-way interaction test suite. The k columns of this array are called factors, where each factor has |v| values. In general, most software systems do not have the same number of values for each factor. A more general structure can be defined that allows variability of |v|.

Definition 2 A mixed level covering array, MCA (N; t, k, (|v1|,|v2|,..., |vk|)), is an N × k array on |v| values, where

| v |␣ ␣k | vi | , with the following properties: (1) Each i␣1

column i (1 ␣ i ␣ k) contains only elements from a set Si of size |vi|. (2) The rows of each N × t subarray cover all t-tuples of values from the t columns at least once.

  • "Construct Pairwise Test Suites Based on the Bak-Sneppen Model of Biological Evolution" World Academy of Science, Engineering and Technology 59 2009 - Jianjun Yuan, Changjun Jiang


If you're a typical software tester, even one motivated to try new methods to improve your skills, you could be forgiven for not mustering up the enthusiasm to read such articles. The relevancy, the power, and the applicability of combinatorial testing - not to mention that this test design method can often double your software testing efficiency and increase the thoroughness of your software testing - all tend to get lost in the abstract, academic, wonky explanations that are typically used to describe combinatorial testing. Unfortunately for pragmatic, action-oriented software testing practitioners, many of the readily accessible articles on pairwise testing and combinatorial testing tend to be on the wonky end of the spectrum; an exception to that general rule are the good, practitioner-oriented introductory articles available at


A Different Approach to Explaining Combinatorial Testing and Pairwise Testing

In the photograph-rich, numbers-light, presentation embedded above, I've tried to explain what combinatorial testing is all about without the wonky-ness. The benefits from structured variation and from using combinatorial test design is, in my view, wildly under-appreciated. It has the following extremely important benefits:

  • Less repetition from test case to test case

    • In the context of discussing testing's "pesticide paradox" James Bach, I believe, used the analogy that following in someone's footsteps is a very good way to survive traversing through a mine field but a generally lousy way to find software defects efficiently.
    • Maximizing variation from test case to test case, as a general rule, is an absolutely spectacular way to find defects quickly.
    • There are thousands, if not trillions of relevant combinations to select from when identifying test cases to execute; computer algorithms will be able to solve the problem of "how can maximum variation be achieved?" far better than human brains can.
  • More coverage of combinations of test inputs

    • Most of the time, since awareness of pairwise and combinatorial testing methods remain low in the software testing community, combining all possible pairs of values in at least one test case is not even a conscious goal of testers.
    • Even if this were a goal of their test design strategy, testers would have a tremendous challenge in trying to achieve such a goal: with hundreds, thousands or tens of thousands of targeted combinations to cover, losing track of a significant number of them and/or forgetting to include them in software tests is virtually a foregone conclusion unless a test case generator is used.
    • More thorough coverage leads to more defects being found.
  • Efficiency (Testers can "turn the coverage dial" to achieve maximum efficiency with a minimal number of tests)

    • The efficiency and effectiveness benefits of pairwise testing have been demonstrated in testing projects every major industry.
    • I wanted to prominently include the message that testers using test case generators have the option to dramatically increase the testing thoroughness levels of the tests they generate because it is a topic that often gets ignored in introductions to pairwise testing case studies and introductions
  • Thoroughness - (Testers can also "turn the coverage dial" to achieve maximum thoroughness if that is their goal)

    • Too often, tester's view pairwise as a technique that focuses on a very small number of curiously strong tests; that is only part of the story.
    • This can lead to the /false/ impression that combinatorial testing methods are inappropriate where high levels of testing thoroughness are required.
    • You can create very different sets of tests that are as thorough as possible (given your understanding of what you are testing) no matter whether you have 1 hour to execute tests or one month to test.


Other Recommended Sources of Information on Pairwise and Combinatorial Testing:

By: Justin Hunter on Oct 7, 2010

Categories: Combinatorial Software Testing, Combinatorial Testing, Design of Experiments, Hexawise test case generating tool, Multi-variate Testing, Pairwise Software Testing, Pairwise Testing, Recommended Tool, Testing Strategies, Uncategorized

There are good reasons James Bach is so well known among the testing community and constantly invited to give keynote presentations around the globe at software testing conferences. He's passionate about testing and educating testers; he's a gifted, energetic, and entertaining speaker with a great sense of humor; and he takes joy in rattling his saber and attacking well-established institutions and schools of thought that he disagrees with. He doesn't take kindly to people who make inflated claims of benefits that would materialize "if only you'd perform testing in XYZ way or with ABC tool" given that (a) he can always seem to find exceptions to such claims, (b) he doesn't shy away from confrontation, and (c) he (rightly, in my view) thinks that such benefits statements tend to discount the importance of critical thinking skills being used by testers and other important context-specific considerations.

Leave it up to James to create a list of 13 questions that would be great to ask the next software testing tool vendor who shows up to pitch his problem-solving product. In his blog post titled "The Essence of Heuristics," he posed this exact set of questions in a slightly different context, but as a software testing tool vendor myself, they really hit home. They are:


  1. Do they teach you how to tell if it’s working?
  2. Do they teach you how to tell if it’s going wrong?
  3. Do they teach you heuristics for stopping?
  4. Do they teach you heuristics for knowing when to apply it?
  5. Do they compare it to alternative heuristics?
  6. Do they show you why it works?
  7. Do they help you understand when it probably works best?
  8. Do they help you know how to re-design it, if needed?
  9. Do they let you own it?
  10. Do they ask you to practice it?
  11. Do they tell stories about how it has failed?
  12. Do they listen to you when you question or challenge it?
  13. Do they praise you for questioning and challenging it?


[Side note: Apparently I wasn't the only one who thought of Hexawise and pairwise / combinatorial test design approaches when they saw these 13 questions. I was amused that after I drafted this post, I saw Jared Quinert's / @xflibble's tweet just now:]


Where do I come down on each of James' 13 questions with respect to people I talk to about our test design tool, Hexawise, and the types of benefits and the size of benefits it typically delivers? Quite simply, "Yes" to all 13. I enjoy talking about exactly the kinds of questions that James raised in his list. In fact, when I sought out James to ask him questions at a conference in Boston earlier this year, it was because I wanted his perspective on many of the points above, particularly #11: (hearing stories about how James has seen pairwise and combinatorial approaches to test design fail), and #7 (hearing his views on where it works best and where it would be difficult to apply it). I'll save my specific answers to another post, but I am serious about wanting to share my thoughts on them; time constraints are holding me back today. I gave a speech at the ASQ World Conference on Quality Improvement in St. Louis last week though that addressed many, but not all, of James' questions.

I'm not your typical software tool vendor. Basically, my natural instincts are all wrong for sales. I agree with the premise that "a fool with a tool is still a fool"; when talking to target clients and/or potential partners, I'm inclined to point out deficiencies, limitations, and various things that could go wrong; I'm more of an introvert than an extrovert, etc. Not exactly the typical characteristics of a successful salesman... Having said that, I believe that we've built a very good tool that helps enable dramatic efficiency and thoroughness benefits in many testing situations but our tool, along with the pairwise and combinatorial test design approaches that Hexawise enables both have their limitations. It is primarily by talking to software testers about their positive and negative experiences that our company is able to improve our tool, enhance our training, and provide honest, pragmatic guidance to users about where and how to use our tool (and where and how not to).

Tool vendors who defend their tools (and/or the approaches by which their tools helps users solve problems) as magical, silver bullet solutions are being both foolish and dishonest. Tool vendors who choose not to engage in serious, honest and open discussions with users about the challenges that users have when applying their tools in different situations are being short-sighted. From my own experiences, I can say that talking about the 13 topics raised by James have been invaluable.

By: Justin Hunter on Jun 1, 2010

Categories: Combinatorial Testing, Design of Experiments, Hexawise test case generating tool, Pairwise Testing, Software Testing, Software Testing Efficiency, Uncategorized



All the quotes below are from the inside cover of Statistics for Experimenters written by George Box, Stuart Hunter, and William G. Hunter (my late father). The Design of Experiments methods expressed in the book (namely, the science of finding out as much information as possible in as few experiments as possible), were the inspiration behind our software test case generating tool. In paging through the book again today, I found it striking (but not surprising) how many of these quotes are directly relevant to efficient and effective software testing (and efficient and effective test case design strategies in particular):

  • "Discovering the unexpected is more important than confirming the known." - George Box

  • "All models are wrong; some models are useful." - George Box

  • "Don't fall in love with a model."

  • How, with a minimum of effort, can you discover what does what to what? Which factors do what to which responses?

  • "Anyone who has never made a mistake has never tried anything new." - Albert Einstein

  • "Seek computer programs that allow you to do the thinking."

  • "A computer should make both calculations and graphs. Both sorts of output should be studied; each will contribute to understanding." - F. J. Anscombe

  • "The best time to plan an experiment is after you've done it." - R. A. Fisher

  • "Sometimes the only thing you can do with a poorly designed experiment is to try to find out what it died of." - R. A. Fisher

  • The experimenter who believes that only one factor at a time should be varied, is amply provided for by using a factorial experiment.

  • Only in exceptional circumstances do you need or should you attempt to answer all the questions with one experiment.

  • "The business of life is to endeavor to find out what you don't know from what you do; that's what I called 'guessing what was on the other side of the hill.'" - Duke of Wellington

  • "To find out what happens when you change something, it is necessary to change it."

  • "An engineer who does not know experimental design is not an engineer." - Comment made by to one of the authors by an executive of the Toyota Motor Company

  • "Among those factors to be considered there will usually be the vital few and the trivial many." - J. M. Juran

  • "The most exciting phrase to hear in science, the one that heralds discoveries, is not 'Eureka!' but 'Now that's funny...'" - Isaac Asimov

  • "Not everything that can be counted counts and not everything that counts can be counted." - Albert Einstein

  • "You can see a lot by just looking." - Yogi Berra

  • "Few things are less common than common sense."

  • "Criteria must be reconsidered at every stage of an investigation."

  • "With sequential assembly, designs can be built up so that the complexity of the design matches that of the problem."

  • "A factorial design makes every observation do double (multiple) duty." - Jack Couden

Where the quotes are not attributed, I'm assuming the quote is from one of the authors. The most well known of the quotes not attributed, above, "All models are wrong; some models are useful." is widely attributed to George Box in particular, which is accurate. Although I forgot to confirm that suspicion with him when I saw him over Christmas break, I suspect most of them are from George (as opposed to from Stu or my dad); George is 90 now and still off-the-charts smart, funny, and is probably the best story teller I've met in my life. If he were younger and on Twitter, he'd be one of those guys who churned out highly retweetable chestnuts again and again. [Update - George Box died in 2013]


Related thoughts

As you know if you've read my blog before, I am a strong proponent of using the Design of Experiments principles laid out in this book and applying them in field of software testing to improve the efficiency and effectiveness of software test case design (e.g., by using pairwise software testing, orthogonal array software testing, and/or combinatorial software testing techniques). In fact, I decided to create my company's test case generating tool, called Hexawise, after using Design of Experiments-based test design methods during my time at Accenture in a couple dozen projects and measuring dramatic improvements in tester productivity (as well as dramatic reductions in the amount of time it took to identify and document test cases). We saw these improvements in every single pilot project when we used these methods to identify tests.

My goal, in continuing to improve our Hexawise test case generating tool, is to help make the efficiency-enhancing Design of Experiments methods embodied in the book, accessible to "regular" software testers, and more more broadly adopted throughout the software testing field. Some days, it feels like a shame that the approaches from the Design of Experiments field (extremely well-known and broadly used in manufacturing industries across the globe, in research and development labs of all kinds, in product development projects in chemicals, pharmaceuticals, and a wide variety of other fields), have not made much of an inroad into software testing. The irony is, it is hard to think of a field in which it is easier, quicker, or immediately obvious to prove that dramatic benefits result from adopting Design of Experiments methods than software testing. All it takes is for a testing team to decide to do a simple proof of concept pilot. It could be for as little as a half-day's testing activity for one tester. Create a set of pairwise tests with Hexawise or another t00l like James Bach's AllPairs tool. Have one tester execute the tests suggested by the test case generating tool. Have the other tester(s) test the same application in parallel. Measure four things:

  1. How long did it take to create the pairwise / DoE-based test cases?

  2. How many defects were found per hour by the tester(s) who executed the "business as usual" test cases?

  3. How many defects were found per hour by the tester who executed the pairwise / DoE-based tests?

  4. How many defects were identified overall by each plan's tests?

These four simple measurements will typically demonstrate dramatic improvements in:

  • Speed of test case identification and documentation

  • Efficiency in defects found per hour

As well as consistent improvements to:

  • Overall thoroughness of testing.


A Suggestion: Experiment / Learn / Get the Data / Let the Efficiency and Effectiveness Findings Guide You

I would be thrilled if this blog post gave you the motivation to explore this testing approach and measure the results. Whether you've used similar-sounding techniques before or never heard of DoE-based software testing methods before, whether you're a software testing newbie or a grizzled veteran, I suspect the experience of running a structured proof of concept pilot (and seeing the dramatic benefits I'm confident you'll see) could be a watershed moment in your testing career. Try it! If you're interested in conducting a pilot, I'd be happy to help get you started and if you'd be willing to share the results of your pilot publicly, I'd be able to provide ongoing advice and test plan review. Send me an email or leave a comment.

To the grizzled and skeptical veterans, (and yes, Mr, Shrini Kulkarni / @shrinik who tweeted "@Hexawise With all due respect. I can't credit any technique the superpower of 2X defect finding capability. sumthng else must be goingon" before you actually conducted a proof of concept using Design of Experiments-based testing methods and analyzed your findings, I'm lookin' at you), I would (re)quote Sophocles: "One must try by doing the thing; for though you think you know it, you have no certainty until you try." For newer testers, eager to expand your testing knowledge (and perhaps gain an enormous amount of credibility by taking the initiative, while you're at it), I'd (re)quote Cole Porter: "Experiment and you'll see!"

I'd welcome your comments and questions. If you're feeling, "Sounds too good to be true, but heck, I can secure a tester for half a day to run some of these DoE-based / pairwise tests and gather some data to see whether or not it leads to a step-change improvement in efficiency and effectiveness of our testing" and you're wondering how you'd get started, I'd be happy to help you out and do so at no cost to you. All I'd ask is that you share your findings with the world (e.g., in your blog or let me use your data as the firms did with their findings in the "Combinatorial Software Testing" article below).



By: Justin Hunter on Jan 27, 2010

Categories: Combinatorial Testing, Design of Experiments, Hexawise test case generating tool, Multi-variate Testing, Software Testing

There are some phrases in English that, as often as not, come off sounding obligatory and/or insincere. The phrase "I'm honored..." comes to mind (particularly if someone is accepting an award in front of a room full of people).

Be that as it may, I genuinely felt really honored last night and again today by a couple comments James Bach has said about me, including these:




Here's the quick background: (1) James knows much more about software testing than I do and I respect his views a lot. (2) He has a reputation for not suffering fools gladly and pretty bluntly telling people he doesn't respect them if he doesn't respect the content of their views. (3) in addition to his extremely broad expertise on "testing in general" James, like Michael Bolton, knows a lot about pairwise and combinatorial testing methods and how to use them. (4) I firmly (and passionately) believe that pairwise and combinatorial testing methods are (a) dramatically under-appreciated, and (b) dramatically under-utilized. (5) James has published a very good and well-reasoned article about some of the limitations of pairwise testing methods that I wanted to talk to him about. (6) I co-wrote an article that IEEE Computer recently published about Combinatorial Testing that I wanted to discuss with him. (7) James and I have been at the STP Conference in Boston over the past few days. (8) I reached out to him and asked to meet at the conference to talk about pairwise and combinatorial testing methods and share with him my findings that - in the dozens of projects I've been involved with that have compared testers efficiency and effectiveness - I've routinely seen defects found per tester hour more than double. (9) I was interested in getting his insights into where are these methods most applicable? Least applicable? What have his experiences been in teaching combinatorial testing methods to students, etc.

In short, frankly, my goals in meeting with him were to: (a) meet someone new, interesting and knowledgeable and learn as much as could and try to understand from his experiences, his impressive critical thinking and his questioning nature, and (b) avoid tripping up with sloppy reasoning (when unapologetically expressing the reasons I feel combinatorial testing methods are dramatically under-appreciated by the software testing community) in front of someone who (i) can smell BS a mile away, and (ii) doesn't suffer fools gladly.

I learned a lot, heard some fantastic war stories and heard his excellent counter-examples that disproved a couple of the generalizations I was making (but didn't dampen my unshaken assertions that combinatorial testing methods are wildly under-utilized by the software testing community). I thoroughly enjoyed the experience. Moving forward, as a result of our meeting, I will go through an exercise which will make me more effective (namely carefully thinking through and enumerating all of the assumptions behind my statements like: "I've measured the effectiveness of testers dozens of times - trying to control external variables as much as reasonably possible - and I'm consistently seeing more than twice as many defects per tester hour when testers adopt pairwise/combinatorial testing methods."

His complement last night was private so I won't share it but it ranks up there in my all time favorite complements I've ever received. I'm honored. Thanks James.

By: Justin Hunter on Oct 23, 2009

Categories: Combinatorial Testing, Design of Experiments, Efficiency, Interesting People , Pairwise Testing, Software Testing, Software Testing Efficiency, Testing Case Studies, Uncategorized

lessons from car manufacturing-20090826-1718522


Tony Baer from Ovum recently wrote a blog post titled: Software Development is like Manufacturing which included the following quotes.

"More recently, debate has emerged over yet another refinement of agile – Lean Development, which borrows many of the total quality improvement and continuous waste reduction principles of [lean manufacturing]( Lean is dedicated to elimination of waste, but not at all costs (like Six Sigma. Instead, it is about continuous improvement in quality, which will lead to waste reduction....

In essence, developing software is like making a durable good like a car, appliance, military transport, machine tool, or consumer electronics product.... you are building complex products that are expected to have a long service life, and which may require updates or repairs."

Here are my views: I see valid points on both sides of the debate. Rather than weigh general high-level pro's and cons, though, I would like to zero in on what I see as an important topic that is all-too-often missing from the debate. Specifically, Design of Experiments has been central to Six Sigma, Lean Manufacturing, the Toyota Production System, and Deming's quality improvement approaches, and is equally applicable to software development and testing, yet adoption of Design of Experiments methods in software design and testing remains low. This is unfortunate because significant benefits consistently result in both software development and software testing when Design of Experiments methods are properly implemented.

What are Design of Experiments Methods and Why are they Relevant?

In short, Design of Experiments methods are a proven approach to creating and managing experiments that alter variables intelligently between each test run in a structured way that allows the experimenter to learn as much as possible in as few experiments as possible. From wikipedia: “Design of experiments, or experimental design, (DoE) is the design of all information-gathering exercises where variation is present, whether under the full control of the experimenter or not. Often the experimenter is interested in the effect of some process or intervention (the “treatment”) on some objects (the “experimental units”).”

Design of Experiments methods are an important aspect of Lean Manufacturing, Six Sigma, the Toyota Production System, and other manufacturing-related quality improvement approaches/philosophies. Not only have Design of Experiments methods been very important to all of the above in manufacturing settings, they are also directly relevant to software development. By way of example, W. Edwards Deming, who was extremely influential in quality initiatives in manufacturing in Japan and the U.S. was an applied statistician. He and thousands of other highly respected quality executives in manufacturing, including Box, Juran and Taguchi (and even my dad), have regularly used Design of Experiments methods as a fundamental anchor of quality improvement and QA initiatives and yet relatively few people who write about software development seem to be aware of the existence of Design of Experiments methods.

What Benefits are Delivered in Software Development by Design of Experiments-based Tools?

Application Optimization applications, like Google’s Website Optimizer are a good example of Design of Experiments methods can deliver powerful benefits in the software development process. It allows users to easily vary multiple aspects of web pages (images, descriptions, colors, fonts, colors, logos, etc.) and capture the results of user actions to identify which combinations work the best. A recent YouTube multi-variate experiment (e.g., and experiment created using Design of Experiment methods) shows how they used the simple tool and increased sign-up rates by 15.7%. The experiment involved 1,024 variations.

What Benefits are Delivered in Software Testing by Design of Experiments-based Tools

In addition, software test design tools, like the Hexawise test design tool my company created, enable dramatically more efficient software testing by automatically varying different elements of use cases that are tested in order to achieve an optimal coverage. Users input the things in the application they want to test, push a button and, as in the Google Web Optimizer example, the tool uses DoE algorithms to identify how the tests should be run to maximize efficiency and thoroughness. A recent IEEE Computer article I contributed to, titled "Combinatorial Testing" shows, on average, over the course of 10 separate real-world projects, tester productivity (measured in defects found per tester hour) more than doubled, as compared to the control groups which continued to use their standard manual methods of test case selection:

Unfortunately, Design of Experiments methods – one of the most powerful methods in Lean Manufacturing, Six Sigma, and the Toyota Production System – are not yet widely adopted in the software development industry. This is unfortunate for two reasons, namely:

  1. Design of Experiments methods will consistently deliver measurable benefits when implemented correctly, and

  2. Sophisticated new tools designed with very straightforward user interfaces make it easier than ever for software developers and testers to begin using these helpful methods.

By: Justin Hunter on Aug 25, 2009

Categories: Agile, Design of Experiments, Efficiency, Lean, Multi-variate Testing, Software Testing, Software Testing Efficiency



Jeff Fry recently linked to a fantastic webcast in Controlled Experiments To Test For Bugs In Our Mental Models. I would highly recommend it to anyone without any reservations. Ron Kohavi, of Microsoft Research does a superb job of using interesting real-world examples to explain the benefits of conducting small experiments with web site content and the advantages of making data-driven decisions. The link to the 22-minute video is here.

I firmly believe that the power of applied statistics-based experiments to improve products is dramatically under-appreciated by businesses (and, for that matter, business schools), as well as the software development and software testing communities. Google, Toyota, and come to mind as notable exceptions to this generalization; they "get it". Most firms though still operate, to their detriment, with their heads in the sand and place too much reliance on untested guesswork, even for fundamentally important decisions that would be relatively easy to double-check, refine, and optimize through small applied statistics-based experiments that Kohavi advocates. Few people who understand how to properly conduct such experiments are as articulate and concise as Kohavi. Admittedly, I could be accused of being biased as: (a) I am the son of a prominent applied statistician who passionately promoted broader adoption of such methods by industry and (b) I am the founder of a software testing tools company that uses applied statistics-based methods and algorithms to make our tool work.

Here is a short summary of Kohavi's presentation:


Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO

1:00 Amazon: in 2000, Greg Linden wanted to add recommendations in shopping carts during the check out process. The "HiPPO" (meaning the Highest Paid Person's Opinion) was against it thinking that such recommendations would confuse and/or distract people. Amazon, a company with a good culture of experimentation, decided to run a small experiment anyway, "just to get the data" – It was wildly successful and is in widespread use today at Amazon and other firms.

3:00 Dr. Footcare example: Including a coupon code above the total price to be paid had a dramatic impact on abandonment rates.

4:00 "Was this answer useful?" Dramatic differences in user response rates occur when Y/N is replaced with 5 Stars and whether an empty text box is initially shown with either (or whether it is triggered only after a user clicks to give their initial response)

6:00 Sewing machines: experimenting with a sales promotion strategy led to extremely counter-intuitive pricing choice

7:00 "We are really, really bad at understanding what is going to work with customers…"

7:30 "DATA TRUMPS INTUITION" {especially on novel ideas}. Get valuable data through quick, cheap experimentation. "The less the data, the stronger the opinions."

8:00 Overall Evaluation Criteria: "OEC" What will you measure? What are you trying to optimize? (Optimizing for the “customer lifetime value”)

9:00 Analyzing data / looking under the hood is often useful to get meaningful answers as to what really happened and why

10:30 A/B tests are good; more sophisticated multi-variate testing methods are often better

12:00 Some problems: Agreeing upon Overall Evaluation Criteria is hard culturally. People will rarely agree. If there are 10 changes per page, you will need to break things down into smaller experiments.

14:00 Many people are afraid of multiple experiments [e.g., multi-variate experiments or MVE] much more than they should be.

(A/B testing can be as simple as changing a single variable and comparing what happens when it is changed, e.g., A = "web page background = Blue" / B = "web page background = Orange." Multi-variate experiments involve changing multiple variables in each test run which means that people running the tests should be able to efficiently and effectively change the variables in order to ensure not only that each of the variables is tested but also that the each of the variables is tested in conjunction with each of the others because they might interact with one another). My views on this: before software tools made conducting multi-variate experiments (and understanding the results of the experiments) a piece of cake, this fear had some merit; you would need to be able to understand books like this to be able to competently run and analyze such experiments. Today however, many tools, such as Google's Website Optimizer (used for making web sites better at achieving their click through goals, etc.) and Hexawise (used to find defects with fewer test cases) build the complex Design of Experiments-based optimization algorithms into the tool's computation engine and provide the user of the tool with a simple user interface and user experience. In short, in 2009, you don't need a PhD in applied statistics to conduct powerful multi-variate experiments. Everyone can quickly learn how to, and almost all companies should, use these methods to improve the effectiveness of applications, products and/or production methods. Similarly, everyone can quickly learn how to, and almost all companies should, use these methods to dramatically improve the effectiveness of their software testing processes.>

16:00 People do a very bad job at understanding natural variation and are often too quick to jump to conclusions.

17:00 eBay does A/B testing and makes the control group ~1%. Ron Kohavi, the presenter, suggests starting small then quickly ramping up to 50/50 (e.g., 50% of viewers will see version A, 50% will see version B).

19:00 Beware of launching experiments than "do not hurt," there are feature maintenance costs.

20:00 Drive to a data-driven culture. "It makes a huge difference. People who have worked in a data-driven culture really, really love it… At Amazon… we built an optimization system that replaced all the debates that used to happen on Fridays about what gets on the home page with something that is automated."

21:00 Microsoft will be releasing its controlled experiments on the web platform at some point in the future, but probably not in the next year.

21:00 Summary

  1. Listen to your customers because our intuition at assessing new ideas is poor.

  2. Don't let the HiPPO drive decisions; they are likely to be wrong. Instead, let the customer data drive decisions.

  3. Experiment often create a trustworthy system to accelerate innovation.


Related: Statistics for Experimenters - Articles on design of experiments

By: Justin Hunter on Aug 18, 2009

Categories: Design of Experiments, Multi-variate Testing, Software Testing