When thinking about how to test software these questions can help you think of ideas you might otherwise miss. These tips are useful for anyone testing software; we do integrate tips that are specfically relevant to those using the Hexawise software testing application.

  • What additional things could I add that probably won't matter (but might?)
  • What are the two parameters in the plan with the most values? And should I shrink the number of parameter values for either case?). If you have a parameter or two with many more parameter values than other parameters have that can greatly increase the number of test cases that must be run to explore every pairwise (or higher) combination. If those are critical to test, then they should be, but often a high number of parameter values is an indication that they really could be reduced. Using value expantions may well be a wise choice.
  • Do the tests cover all the high priority scenarios that stakeholders want covered? Once you generate the tests think about high priority scenarios and if they are missing add them as required test cases. It may well be worth adding tests for the most common scenarios. Often this can be done without requiring extra tests when using Hexawise. Hexawise will just consider those and then create test cases to match your criteria which often won't require an extra test. Sometimes it will add a test or a few tests to the total, in which case you can decide the benefit is worth the added cost of those tests.
  • If you're testing a process, what are the if/then points where the process diverges? Make sure the alternative processes are being tested.
  • If the plan is heavily constrained, would it make sense to split it into multiple separate plans?
  • Might it make a difference to how the system operates if you were to include additional variation ideas related to: who, what, when, where, why, how and how many? Those questions are often helpful in identifying parameters that should be tested (and occassionally in identifying parameter values to test).
  • Have you accidentally included values in your plan that everyone is confident won't impact the system? You want to test everything that is important but if you test things that are not going to impact whether the software works or not it adds cost with no benefit.
  • Have you accidentally included "script killing" negative tests in your plan? See details on this point in our training file, different types of negative tests need to be handled differently when using Hexawise
  • Have you clearly thought about the implications of the distinction between impossible to test for combinations vs combinations that will lead the system to display error messages? Impossible to test combinations can be avoided using the invalid pair feature. But situations where users can enter values that should results in error messages should be tested to validate those error messages appear.
  • Have you considered the cost/benefit trade-off of executing, say, 50% of the Hexawise generated test set vs 75% vs 100%? It is best to think about what is the most sensible business decision instead of just automatically going for some specfic percentage. Depending on the softare being tested and the impact on your business and customers different levels of testing will make sense.

Too often software testing fails to emphasis the importance of experienced software testers using their brains, insight, experience and knowledge to make judgements about what is the best course of action. Hexawise is a wonderful tool to help software testers but the tool needs a smart and thoughtful person to wield it and produce the results we all want to see.

Related: Software Testers Are Test Pilots - Create a Risk-based Testing Plan With Extra Coverage on Higher Priority Areas - How Much Detail to Include in Test Cases?

By: John Hunter and Justin Hunter on Mar 14, 2016

Categories: Testing Strategies, Software Testing, Hexawise test case generating tool, Combinatorial Software Testing, Hexawise tips

What is the Coverage Matrix?

Some of the most challenging questions testing teams are asked include:

  • Are we testing enough?
  • Are we testing too much?
  • What is the level of testing coverage these tests achieve?
  • What if we get extremely pressed for time… What level of coverage could we achieve in half as many tests as we have planned?

Hexawise now allows you to visualize testing coverage more precisely than ever. The precise pairwise interactions covered by Hexawise-generated tests are displayed in the Coverage Matrix. This is different than our Coverage Graph, which allows users to see the additional coverage each test in their test set provides but does so without providing as much granular detail.

The Hexawise Coverage Matrix is a grid made up of all the pairs in the test plan being analyzed. With each value listed down the side and also across the top, you can find an intersection and see if that specific pair is covered at any point in your set of Hexawise-generated tests. Read on for the specific features of this reporting function.

 

The Coverage Matrix in action

Before we dive into the specifics of the Coverage Matrix, let’s see it in action.

If we have a test plan and have created some tests, we need to go the ‘Analyze Tests’ screen and click on Coverage Matrix. I recommend using a larger test plan (5+ parameters with a handful of values each) to truly see the power this visualization has.

 

analyze tests matrix arrow

‘OoooOOOooo’ snazzy new layout. That’s right, we’re taking it up a notch!

Once you click on the Coverage Matrix, the chart goes through an animation showing the coverage each tests achieves.

 

Things to look for in the animation

  • Just beneath the slider over the chart, you can see what percentage of interactions is covered (just like the Coverage Graph)
  • Also underneath the slider, you can see the precise test that is being displayed
  • The Coverage Matrix starts fully red (0 tests equals 0% coverage)
  • As each test is added, the coverage increases turning each pair covered into a green square
  • The slider at the top can be used after the animation is complete

 

Matrix Chart trimmed

 

How to read the Coverage Matrix

Now let’s get into the specifics of this feature.

 

What the squares mean

The Coverage Matrix has 3 primary indicators for coverage:

  1. Red Squares: A pair not currently covered
  2. Green Squares: A pair that has been covered
  3. Black Squares: A pair that will never be covered

 

coverage matrix

Red Squares: The Coverage Chart starts off all red, since at 0 tests, 0 pairs are covered. The number of red squares will decrease with each subsequent test as coverage increases with each test. At any given test, the amount of red squares is the equivalent to the amount of pairwise gaps.

 

Green Squares: As each test is added, more pairs are covered. The squares convert from red to green. If you were to stop executing Hexawise-generated tests before the final test, you would leave gaps in your pairwise coverage. The green squares show you what you will have covered if you stop early.

 

Black Squares: If you add invalid or married pairs to your test plan, you are intentionally removing pairs from ever appearing in your test set. Usually that will be because such combinations are impossible or impractical to test together (such as O/S = Mac OSX with Browser = Internet Explorer). The pairs that are removed are shown as black squares in the Coverage Matrix.

 

How the Coverage Matrix is laid out

Your parameter values are listed down the side and across the top. The intersection of the parameter values is the coverage status for a particular pair of values.

This is a Coverage Matrix

 

coverage matrix

Your parameter values

In order to make for a nice display, the parameters are listed down the side from the first to the second to last, and then across the top from second parameter to the last parameter.

Example: If the parameters in the plan are ‘A,’ ‘B,’ ‘C,’ ‘D,’ and ‘E,’ you will have a Coverage Matrix that looks like this one:

 

generic matrix

Using the Coverage Matrix

Since each Hexawise-generated test includes many pairs (or sometimes just a few) the test designer is sometimes unaware of when each pair is covered. The Coverage Matrix allows the tester to visualize and communicate when a precise pair is first covered in their test set.

 

Let’s take a detailed look at exactly how the Coverage Matrix accomplishes this.

 

For this example, we have a Mortgage Application system being tested. We achieve 81% pairwise coverage after just 8 tests. What kind of pairs do we see being covered by our 8th test.

 

matrix indicators explained

We also see there are sporadic red squares, since we are only viewing 80% coverage of all possible pairs. So we review this chart with our stakeholders, and they like it. But then they see towards the bottom that Region 2 is not tested with Investment Property. They tell you that situation needs to be covered, and ask how we’ll cover it.

By opening Hexawise, we can drag the slider to see when that red square turns green.

 

test 9

In this case, it’s the very next test: test 9. So we alert the stakeholders that we can cover that scenario by executing 9 tests, which bumps our overall coverage to 86% pairwise coverage.

 

In other similar situations, a given “high-priority” combination might not appear until significantly later in a plan. If a stakeholder-requested combination did not appear until near the end in this example, we might be best off executing the first 8 Hexawise tests and then addressing that stakeholder-requested combination in a one-off test outside of Hexawise. (There are “fancier” options for achieving that goal (such as using Hexawise’s “freeze” function), but that’s a different topic for a different blog post. This one is lengthy as it is.

 

Related Considerations

Anyone still reading at this point deserves a couple bits of extra insight:

First, it is important to keep in mind that Hexawise orders your tests in an optimal way so that if you stop testing after any given test, you will have covered as much as possible within the amount of time you have had to test. You can see this by comparing the (much larger) number of squares that turn green in your first test to the (much smaller) number of squares that turn green in your final test.

Second, if you are discussing the Coverage Matrix with a stakeholder who is new to Hexawise’s optimized coverage approach, they might need a brief explanation about why pairwise testing coverage is such an effective starting point for test prioritization. Introductory articles such as “Why do most software tests suck?” and “Does pairwise testing really work?” might be helpful. Additionally, it is equally important to remember that high priority scenarios that need to consist of more than 2 values should also be included in your test sets. If a stakeholder suggests including an involved scenario in your tests, keep in mind these two things. You can easily force such scenarios to be included in your test sets using Hexawise’s Requirements Feature. Those seeded high-priority scenarios will appear in the very first tests in your Hexawise test set; you do not need the new Matrix Chart to confirm that they are being included in your test set.

 

A Few Final Thoughts

This is one of our favorite Hexawise features we have ever released. We’re excited to make it available it and truly hope you find it to be useful. We are constantly seeking new ways to make it easier for software testers to not only create unusually powerful, thorough, efficient, and effective test sets but also new ways to help testers and stakeholders communicate clearly about the software testing coverage that their tests are achieving. We hope that the new Coverage Matrix feature will make it easier for you to communicate the superior testing coverage you’re achieving when you discuss your Hexawise-generated tests with stakeholders.

By: Jordan Weck on Jan 29, 2015

Categories: Combinatorial Software Testing, Pairwise Software Testing, Combinatorial Testing, Pairwise Testing, Hexawise test case generating tool

We have written before about the general question of "Should I Use One Test Plan or Multiple Plans?" This post addresses the same question with a focus on plans that have a relatively large percentage of constraints (e.g., a relatively large number of Invalid Pairs and/or Married Pairs).

Hexawise creates a Plan Scorecard that analyzes every set of tests Hexawise users generate. The Plan Scorecard exists to help you identify potential problems with plans you create and to make you aware of possible ways to improve your sets of tests. One of the notifications the Plan Scorecard provides goes like this:

Consideration: "53% of the parameter values are directly or indirectly constrained."
Explanation: Test plans with more than half of the parameter values constrained are often trying to do too much. They may be better broken into more than one test plan.

Constraints are used to prevent "impossible to test for" scenarios from being generated by the Hexawise test case generator. For more information about entering constraints into Hexawise, see these explanations of Invalid Pairs and Married Pairs. If you do not use Invalid Pairs or Married Pairs in a plan, 0% of your parameter values would be constrained.

What should you do if you get a notification that your plan is highly constrained? Simple: consider your options. Specifically, consider the pro's and con's of splitting the plan you're working on into multiple separate plans.

 

Why can heavily-constrained plans be a problem?

  • A number above 50% or so indicates that it might make sense to consider breaking your plan into 2 or more plans.
  • Why? Because with more and more constraints in a plan, keeping track of them all, making sure they're all accurate, and making sure the constraints in certain parts of your plan do not conflict with constraints in other parts of your plan in unintended ways, can start to take a lot of mental energy.
  • Furthermore, if your constraints do begin to conflict with one another, that could make it impossible for the Hexawise algorithm to identify valid values to populate in some parts of some of your tests. When that happens, instead of an actual value appearing in a test case, you will see the words "No Possible Value" appear.

 

Why are multiple simpler plans often preferable to one more complex one?

  • It is often much easier and quicker (from a modeling standpoint) to create two different plans. Creating two separate plans instead of one single plan often makes it possible to eliminate the need for the majority of the constraints in your plan.
    • For example, if you had a pizza ordering application where there were a lot of constraints around the value "meat pizza" and a lot of constraints around the value "vegetarian pizza" it could be attractive to create one plan (e.g., one set of tests) for meat pizzas and a different set of tests for veg. pizzas.
  • Simpler plans with fewer constraints tend to be easier to understand, modify, and maintain.

 

What are practical considerations when splitting a single plans into multiple ones?

  • To determine where / how to split a plan, begin by asking "what values have the most constraints associated with them?"
    • In the example above, "meat pizza" and "veggie pizza" had the most constraints; creating one plan for meat pizzas and a separate plan for veggie pizzas was the way to go. It would not have made sense to split the plan into one plan with scenarios involving transactions paid for in cash and a different plan with scenarios involving scenarios paid for with credit cards if types of payment type did not have many Invalid Pairs or Married Pairs associated with them.
    • We were recently talking to a client recently where 58% of their plan's parameter values were constrained. We helped them look at where most of the constraints were coming from. It turned out that "Timing of Loan Payment" was the main culprit. As a result, we suggested they consider three separate plans; one plan for Delinquent Payment Scenarios, one for Regular/Timely Loan Payment Scenarios, and one plan for Loan Pre-Payment Scenarios.
    • While working with another client that was dealing with a highly constrained plan, "Type of User" was the source of most of the constraints. Super-Users were allowed to perform all kinds of activities on the System Under Test. "Regular Users" were able to perform a far more limited number of actions. It made sense in that case to break the original combined plan into two separate plans; one plan for Admin User Scenarios and one plan for Regular User scenarios.
  • After determining where to split a plan, the next steps tend to be relatively straightforward:
    • If you're starting with one combined plan and want to break it into two plans, we would recommend these steps:
    • Start by creating 3 copies of the same plan:
    • Make a copy of the original combined plan so you can easily go back to an earlier known version if things start to go horribly wrong (or if you realize that the multiple plan strategy results in the creation of significantly more tests than the original single plan version)
    • Make a copy that you will modify for, e.g., "Regular User Scenarios"
    • Make a copy that you will modify for, e.g., "Admin User Scenarios"
    • Take advantage of Hexawise's Bulk Edit feature and tailor each plan as needed.
    • Delete any unnecessary Parameters, Invalid Pairs, Married Pairs, Requirements, and Expected results
    • Add high-priority scenarios as necessary

 

What disadvantage might there be to multiple plans?

  • A potential disadvantage to a multiple plan approach is that it sometimes results in more tests generated than a single test plan approach would.

By: Justin Hunter on Aug 14, 2014

Categories: Combinatorial Software Testing, Hexawise, Hexawise tips, Testing Checklists, Testing Strategies

I run into the same problem quite often: people have a hard time distinguishing between good and bad tests.

So what are ‘bad tests’?

Bad tests are those that don’t make the tester learn as much as possible with each test step. Said another way: bad tests are repetitive.

Repetitive tests are time wasters. They make testing a mundane task to be completed. Repetitive tests don’t emphasize the complexity inherent in most systems today. And they let things (read: bugs, defects, faults) slip through into production.

Experiment Time!

Question: How bad are bad tests?

Research: (not much out there)

Hypothesis: Bad tests are deceptively bad.

Experiment:

  1. Choose some tests.
  2. Model their ideas in Hexawise.
  3. Lock-in bad tests as Requirements.
  4. Find out how many of the interactions bad tests cover.
  5. Then see how many tests are needed to cover all pairwise interactions.

Step 1: Choosing Tests

I took some tests that some testers all agreed cover the functionality that needed to be covered for the story they were testing.

Step 2: Test Designing

I sat down and analyzed them. Repetitive as any I’d seen. I sifted through them to pull out the main testing ideas. These I would use as parameters and values. I entered these into Hexawise.

Step 3: Lock-in Bad Tests

I used our Requirements feature to lock-in their repetitive tests. So Hexawise would be forced to use their tests first (this would allow me to model how many interactions each of their tests covered)

Step 4: Analyzing Bad Tests

This is the chart Hexawise produced. In 30 tests, they covered 47% of the total possible pairwise interactions.

30 bad tests

Before we go on: Do you notice that there are little plateaus? From Test 5 to 6, 7 to 8, 14 to 15, 20 to 21, 22 to 23, 26 to 27, and 29 to 30? That means those were tests that did not include ANY new pairs. No new information could have been learned from those tests. Learning literally plateaued. 7 times. Nearly 1 out of 4 tests didn't teach the testers anything.

Then I unleashed the Kraken Hexawise.

Step 5: Let Hexawise Optimize

I removed those Requirements. See how many tests Hexawise needs to cover all of the interactions in this specific functionality.

30 good tests

Okay, to be honest, I wanted Hexawise to do it in like 20 tests. (More Coverage. Fewer Tests.) But it used 30 (More Coverage). BUT (and this is a big BUT snickers) Hexawise covered 100% of the pairwise interactions in 30 tests.

Lessons Learned

No one would have guessed their tests were that bad by just reading through them. They looked perfectly fine. They read like good test cases. But as we started to visualize their coverage, we saw that perhaps they weren't achieving all they could. And when we compared bad tests to Hexawise tests, more coverage (in the same amount of tests) is a clear winner.

In short:

  • Bad tests are deceptively bad
  • Sometimes you have to prove it
  • Pairwise tests can alleviate bad-test-initis

By: Jordan Weck on Jul 2, 2014

Categories: Combinatorial Software Testing, Pairwise Software Testing, Software Testing Efficiency

Begin with the “Goldilocks Rule” in mind to identify how much detail is appropriate.

unnamed

If your tests cover a large scope, as in a set of end-to-end tests of a process, you focus first on entering only the most important elements of that process into Hexawise. If your tests cover a small scope, as in tests that focus on a few items on a single screen, the amount of detail you will want to include in Hexawise is higher. Learn more about the Goldilocks Rule.

 

Imagine explaining your System Under Test to someone’s mother. Start with 5 things that may change in each test

Suggestion: do not start this process with detailed Requirements or Technical Specifications. Instead, start with your basic common sense description of some things that would change from test to test.

  • If you were explaining the application to someone’s mother, how would you explain what it does in 2 minutes?
  • What kinds of things would be important to vary from test to test?
    • Hardware and software configurations?
    • User types?
    • Different actions that a user might take?
  • Identify 5 things that change from test to test and turn those 5 things into your first Parameters.
    • How might those things change? Add one or more Values for each Parameter.
    • At this point general descriptions might be fine; (e.g., SUV’s or Economy cars vs. Toyota Corolla).
    • Remember that, where possible, you should avoid creating long lists of values.

 

Create a draft set tests, assess obvious big gaps in the tests, and start filling them.

What obvious types of scenarios are missing? Add parameters and values as necessary to fill those big, obvious holes.

 

Create tests again, assess whether you’re covering necessary business rules and requirements.

If your tests are not yet testing a business rule or Requirement that you want to test:

  • Consider adding a Parameter or Value
  • Consider adding a specific combination of values to be tested together in the requirements tab

 

Reduce scope if the test plan is too complex

  • Consider cutting the scope of the plan (e.g., create two different plans with largely similar parameters – one for regular users and one for special users – instead of one big plan which tries to “do it all.”)
  • Consider changing the way you are describing “hard coded” values. Instead, of “iPhone 4S with International roaming” (which might not be a valid option after the first part of the draft test case suggests a transaction for a phone for corporate customers from a Northern location responding to the special holiday offer…) consider using descriptive parameters and values along the lines of “from the available options of phones available at this point in the test case, select an option that meets as many of these conditions as you can…

 

Consider adding additional details into your plan from other sources.

Other sources for test ideas could be:

J Michael Hunter’s “You Are Not Done Yet”
Elisabeth Hendrikson’s Test Heuristics Cheat Sheet
Hans Buwalda’s Soap Opera Testing

By: John Hunter on Jun 17, 2014

Categories: Combinatorial Software Testing, Software Testing

Here at Hexawise, we aim to make the process of testing easier and more efficient. One way in which we've done this is by promoting our test design tool. But we were seeing people make test plans too complicated. So we came up with an easy way to create super powerful test plans that stay simple and effective.

What did we come up with, you ask?

'Start with a verb and a noun.'

The idea of using a verb and a noun to describe the appropriate scope for a set of tests has been used by Eduardo Miranda. As he points out, if you find yourself tempted to add testing ideas (e.g., explore the help files in depth) that do not easily fit into your chosen verb and noun (e.g., "book a flight"), that can be a useful red flag; accordingly, you might want to exclude those new test ideas that "don't fit" from the test scenarios in your current scope of tests.

This strategy is useful for two main reasons.

One - It is a great leaping off point from which to create interesting scenarios.

The tester is forced to understand and question their system under test. For some, this is a radically different idea to what their job is. We typically hear something like "You mean, I can't just 'validate the file exists?'"

Two - The 'verb and noun' strategy requires you remain specific to one common goal.

Test plans get bloated when you start incorporating disparate ideas. This is commonly seen when testing a system that would be described as 'Apply for Loan' and you start adding in ideas to 'explore help files.' While exploring the help files will be necessary at some point, it probably will not trigger results needed for successfully testing your application process.

Now, let's explore this first reason:

You start by choosing any verb and noun.

Verb and Noun 1

Then you have to create questions to understand that verb and noun.

Nwspaper questions

Then answer your questions. This is important. If you can't answer them, how could you possibly test the system?

Answer the questions

These ideas of questions and answers lend themselves quite well to be used as test steps or scenario planning. Below you can see how well they imported into Hexawise.

Add those into Hexawise

Generating tests is the only thing left to do before testing.

5 Click on Create Tests

Hopefully you've enjoyed this exploration on making simple and effective tests using the 'Verb and Noun' process.

By: Justin Hunter on Jun 4, 2014

Categories: Combinatorial Software Testing

Recently, the following defect made the news and was one of the most widely-shared articles on the New York Times web site. Here's what the article, Computer Snag Limits Insurance Penalties on Smokers said:

A computer glitch involving the new health care law may mean that some smokers won’t bear the full brunt of tobacco-user penalties that would have made their premiums much higher — at least, not for next year.

The Obama administration has quietly notified insurers that a computer system problem will limit penalties that the law says the companies may charge smokers, The Associated Press reported Tuesday. A fix will take at least a year.

 

Tip of the Iceberg

This defect was entirely avoidable and predictable. Its safe to expect that hundreds (if not thousands) of similar defects related to Obamacare IT projects will emerge in the weeks and months to come. Had testers used straightforward software test design prioritization techniques, bugs like these would have been easily found. Let me explain.

 

There's no Way to Test Everything

If the developers and/or testers were asked how could this bug could sneak past testing, they might at first say something defensive, along the lines of: "We can't test everything! Do you know how many possible combinations there are?" If you include 40 variables (demographic information, pre-existing conditions, etc.) in the scope of this software application, there would be:

41,231,686,041,600,000

possible scenarios to test. That's not a typo: 41 QUADRILLION possible combinations. As in it would take 13 million years to execute those tests if we could execute 100 tests every second. There's no way we can test all possible combinations. So bugs like these are inevitably going to sneak through testing undetected.

 

The Wrong Question

When the developers and testers of a system say there is no way they could realistically test all the possible scenarios, they're addressing the wrong challenge. "How long would it take to execute every test we can think of?" is the wrong question. It is interesting but ultimately irrelevant that it would take 13 million years to execute those tests.

 

The Right Question

A much more important question is "Given the limited time and resources we have available for testing, how can we test this system as thoroughly as possible?" Most teams of developers and software testers are extremely bad at addressing this question. And they don't realize nearly how bad they are. The Dunning Kruger effect often prevents people from understanding the extent of their incompetence; that's a different post for a different day. After documenting a few thousand tests designed to cover all of the significant business rules and requirements they can think of, testers will run out of ideas, shrug their shoulders in the face of the overwhelming number of total possible scenarios and declare their testing strategy to be sufficiently comprehensive. Whenever you're talking about hundreds or thousands of tests, that test selection strategy is a recipe for incredibly inefficient testing that both misses large numbers of easily avoidable defects and wastes time by testing certain things again and again. There's a better way.

 

The Straightforward, Effective Solution to this Common Testing Challenge: Testers Should Use Intelligent Test Prioritization Strategies

If you create a well-designed test plan using scientific prioritization approaches, you can reduce the number of individual tests to test tremendously. It comes down to testing the system as thoroughly as possible in the time that's available for testing. There are well-proven methods for doing just that.

 

There are Two Kinds of Software Bugs in the World

Bugs that don't get found by testers sneak into production for one of two main reasons, namely:

  • "We never thought about testing that" - An example that illustrates this type of defect is one James Bach told me about. Faulty calculations were being caused by an overheated server that got that way because of a blocked vent. You can't really blame a tester who doesn't think of including a test involving a scenario with a blocked vent.

  • "We tested A; it worked. We tested B; it worked too.... But we never tested A and B together." This type of bug sneaks by testers all too often. Bugs like this should not sneak past testers. They are often very quick and easy to find. And they're so common as to be highly predictable.

 

Let's revisit the high-profile bug Obamacare bug that will impact millions of people and take more than a year to fix. Here's all that would have been required to find it:

  • Include an applicant with a relatively high (pre-Medicare) age. Oh, and they smoke.

 

Was the system tested with a scenario involving an applicant who had a relatively high age? I'm assuming it must have been.

Was the system tested with a scenario involving an applicant who smoked? Again, I'm assuming it must have been.

Was the system tested with a scenario involving an applicant who had a relatively high age who also smoked? That's what triggers this important bug; apparently it wasn't found during testing (or found early enough).

 

If You Have Limited Time, Test All Pairs

Let's revisit the claim of "we can't execute all 13 million-years-worth of tests. Combinations like these are bound to sneak through, untested. How could we be expected to test all 13 million-years-worth of tests?" The second two sentences are preposterous.

  • "Combinations like these are bound to sneak through, untested." Nonsense. In a system like this, at a minimum, every pair of test inputs should be tested together. Why? The vast majority of defects in production today would be found simply by testing every possible pair of test inputs together at least once.

  • "How could we be expected to test all 13 million-years-worth of tests?" Wrong question. Start by testing all possible pairs of test inputs you've identified. Time-wise, that's easily achievable; its also a proven way to cover a system quite thoroughly in a very limited amount of time.

 

Design of Experiments is an Established Field that was Created to Solve Problems Exactly Like This; Testers are Crazy Not to Use Design of Experiments-Based Prioritization Approaches

The almost 100 year-old field of Design of Experiments is focused on finding out as much actionable information as possible in as few experiments as possible. These prioritization approaches have been very widely used with great success in many industries, including advertising, manufacturing, drug development, agriculture, and many more. While Design of Experiments test design techniques (such as pairwise testing and orthogonal array testing / OA testing) are increasingly becoming used by software testing teams but far more teams could benefit from using these smart test prioritization approaches. We've written posts about how Design of Experiments methods are highly applicable to software testing here and here, and put an "Intro to Pairwise Testing" video here. Perhaps the reason this powerful and practical test prioritization strategy remains woefully underutilized by the software testing industry at large is that there are too few real-world examples explaining "this is what inevitably happens when this approach is not used... And here's how easy it would be to avoid this from happening to you in your next project." Hopefully this post helps raise awareness.

 

Let's Imagine We've Got One Second for Testing, Not 13 Million Years; Which Tests Should We Execute?

Remember how we said it would take 13 million years to execute all of the 41 quadrillion possible tests? That calculation assumed we could execute 100 tests a second. Let's assume we only have one second to execute tests from those 13 million years worth of tests. How should we use that second? Which 100 tests should we execute if our goal is to find as many defects as possible?

If you have a Hexawise account, you can to your Hexawise account to view the test plan details and follow along in this worked example. To create a new account in a few seconds for free, go to hexawise.com/free.

By setting the 40 different parameter values intelligently, we can maximize the testing coverage achieved in a very small number of tests. In fact, in our example, you would only need to execute only 90 tests to cover every single pairwise combination.

The number of total possible combinations (or "tests") that are generated will depend on how many parameters (items/factors) and how many options (parameter values) there are for each parameter. In this case, the number of total possible combinations of parameters and values equal 41 quadrillion.

 

insurance-bug-1

This screen shot shows a portion of the test conditions that would be included the first 4 tests of the 90 tests that are needed to provide full pairwise coverage. Sometimes people are not clear about what "test every pair" means. To make this more concrete, by way of a few specific examples, pairs of values tested together in the first part of test number 1 include:

  • Plan Type = A tested together with Deductible Amount = High

  • Plan Type = tested together with Gender = Male

  • Plan Type = A tested together with Spouse = Yes

  • Gender = Male tested together with State = California

  • Spouse = Yes tested together with Yes (and over 5 years)

  • And lots of other pairs not listed here

 

insurance-bug-2

This screen shot shows a portion of the later tests. You'll notice that the values are shown in purple italics. Those values listed in purple italics are not providing new pairwise coverage. You will note in the first tests every single parameter value is providing new pairwise coverage value, toward the end few parameter value settings are providing new pairwise coverage. Once a specific pair has been tested, retesting it doesn't provide additional pairwise coverage. Sets of Hexawise tests are "front loaded for coverage." In other words, if you need to stop testing at any point before the end of the complete set of tests, you will have achieved as much coverage as possible in the limited time you have to execute your tests (whether that is 10 tests or 30 tests or 83). The pairwise coverage chart below makes this point visually; the decreasing number of newly tested pairs of values that appear in each test accounts for the diminishing marginal returns per test.

 

You Can Even Prioritize Your First "Half Second" of Tests To Cover As Much As Possible!

insurance-bug-3

This graph shows how Hexawise orders the test plan to provide the greatest coverage quickly. So if you get through 37 of the 90 tests needed for full pairwise coverage you have already tested over 90% of all the pairwise test coverage. The implication? Even if just 37 tests were covered, there would be a 90% chance that any given pair of values that you might select at random would be tested together in the same test case by that point.

 

Was Missing This Serious Defect an Understandable Oversight (Because of Quadrillions of Possible Combinations Exist) or was it Negligent (Because Only 90 Intelligently Selected Tests Would Have Detected it)?

A generous interpretation of this situation would be that it was "unwise" for testers to fail to execute the 90 tests that would have uncovered this defect.

A less generous interpretation would be that it was idiotic not to conduct this kind of testing.

The health care reform act will introduce many such changes as this. At an absolute minimum, health insurance firms should be conducting pairwise tests of their systems. Given the defect finding effectiveness of pairwise testing coverage, testing systems any less thoroughly is patently irresponsible. And for health insurance software testing it is often wiser to expand to test all triples or all quadruples given the interaction between many variables in health insurance software.

Incidentally, to get full 3 way test coverage (using the same example as above) would require 2,090 tests.

 

Related: Getting Started with a Test Plan When Faced with a Combinatorial Explosion - How Not to Design Pairwise Software Tests - Efficient and Effective Test Design

By: Justin Hunter on Sep 26, 2013

Categories: Combinatorial Software Testing, Hexawise test case generating tool, Multi-variate Testing, Pairwise Software Testing, Software Testing, Testing Strategies

A client informed us that they had created (and used) approximately 3,500 test cases to test the search functionality of their application. They had a strong suspicions that (a) they should be able to test the search functionality of their application with fewer tests, (b) the tests they had accidentally omitted tests of many hundreds of plausible combinations of values that would be useful to test for (but did not know how to precisely identify were those gaps were without a huge amount of work), and that (c) many of these tests were quite inefficient in that they repeated many steps that had already been tested in other tests in the plan (even if they were not 100% duplicative of any other single test in the plan).

This client should have spoken to Lanette Creamer before they got into that situation. Lanette is a testing expert and blogger with ideas worth paying attention to. For example, her paper, Reducing Test Case Bloat, is well worth reading as is her blog.

There are times when what you cut may not be bloat. There are some situations where the decisions are the equivalent of “Do we cut off the arm or the head?” Well, a person can live without an arm. If you are in a situation where you are so time constrained that critical areas will be untested, you can still communicate the risk, be transparent and use a strategy to test the most important areas first. It is possible to plan for and do testing for a very time constrained project.

 

Of course avoiding this situation is best. Improving testing processes to use the best thoughts and tools is a better option. Cutting the bloat can allow resources to be applied to those areas they are really needed. Often though, people are scared of trying new ideas and cling to old methods, even if that results in the organization having to take increased risks by failing to test critical areas sufficiently. They are just more scared of trying new ideas than of getting away with saying we need more funding if you want more testing.

Lanette's article provides 8 specific suggestions for process improvements to reduce bloat. The first suggestion is to use combinatorial testing tools to greatly improve coverage while reducing workload. Another suggestion is to run the bloat reduction ideas by the stakeholders.

As part of your plan to reduce bloat, it can be helpful to state your assumptions about who is important and where you are placing testing priority and why. When reducing test case bloat you are taking a calculated risk. You are weighing the risk of being unable to test new features by insisting on testing every legacy case against the risk of purposefully not running some tests. When you share your starting assumptions with your stakeholders you offer them the chance to counter with their own assumptions and often you can clarify the boundaries of your testing this way to avoid gaps in testing or duplication.

 

See the full article for more good ideas on how to get better results for the existing testing resources available to your organization.

 

Related: Design of Experiments is about Learning ASAP and, in Software Testing, Finding Bugs ASAP - Pairwise and Combinatorial Software Testing in Agile Projects - Cem Kaner: Testing Checklists = Good / Testing Scripts = Bad?

By: John Hunter and Justin Hunter on Apr 24, 2013

Categories: Combinatorial Software Testing, Efficiency, Multi-variate Testing, Software Testing, Testing Strategies

This post addresses some comments and skeptical (in a good way) questions raised by Phil Kirkham to our recent posts: The Software Testing Community Needs More Empirical Studies.

As background to my answers: The studies and dozens of proof of concept pilot projects that I’ve been directly involved with have sought to answer these 3 questions:

1) Is it actually faster to generate tests with Hexawise than creating and documenting them manually?

Consistent findings: Yes. It takes, on average, about 40% less time to create and document tests using Hexawise because using Hexawise allows testers to partially automate test selection and test documentation steps.

2) Is it possible to generate smaller sets of tests that will be as thorough or more thorough than larger sets of manually created tests and allow testers to find more defects in less test execution time?

Consistent findings: Yes. Typically more than twice as many defects per tester hour. See, e.g., the IEEE Computer article written with 3 PhDs showing an increase in defects found per tester hour of 2.4 times. A more recent set of 10 proof of concept pilot projects at an insurance firm revealed 3.0 times as many defects per tester hour. See: Does pairwise testing really work? Evidence, data, and case studies.
This is because Hexawise-generated tests (or any pairwise tests, for that matter) consistently have dramatically less wasteful repetition than manually selected tests will and because Hexawise-generated tests leave no potential dual-mode faults untested (that is, no potential pairwise defects involving test inputs that have been contemplated by the tester and included in their models).

3) Finding more defects per tester hour is certainly nice, but do Hexawise-generated tests find MORE defects?

Consistent findings: Yes. Much smaller set of Hexawise tests have consistently found more defects. On average by about 13%

 

In answer to specific questions:

Phil Kirkham: “Seems a very basic measure of a testers productivity How about the severity of the defects ?”

JH: Agreed.

In my experience in more than 5 years of helping teams conducting these proof of concept pilot projects, pairwise and Hexawise-generated tests are just plan more effective at finding defects. They find more of ALL kinds of defects. My experience has been that the types of defects being found is not skewed towards less significant types of defects or missing severe defects. A case in point: at my old firm, one of the early adopters of orthogonal array testing approaches ran pilot project after pilot project with teams of testers reporting into him. I can’t remember the exact number of pilots he had conducted, perhaps 20 pilot projects or so, before he experienced a single defect that escaped the Hexawise-generated tests that was found by the much longer set of manually selected tests. So, a short, blunt, honest answer to your excellent question, is “Believe it or not, Phil, it almost never matters. This approach will find ALL of the defects you otherwise would have found. Plus additional ones.”*

*Major caveat here that calls into question my specific answer here (to your question concerning severity) as well as all of the results from all of the studies I’ve been involved with. Testers like you and me are strong proponents of Exploratory Testing. These studies, though, treat the test inputs and test cases as “frozen.” You have the test cases in list A (created manually) and the test cases in list B (created by using Hexawise). The ideas about what can be changed from test to test (parameters) and how each of those things can be changed (values) are identical in both lists. The difference is that list A has lots of wasteful repetition and lots of gaps in coverage. List B has neither. That’s the only difference. Then one tester executes the tests from list A and another tester executes the tests from list B. But what if you have an unskilled tester following rote scripts executing one set of tests and someone like you, Rob Sabourin, Michael Bolton, James Bach, Shmuel Gershon, Ajay Balamurugadas, etc., executing the second set? Whoa! All bets are off. What would happen is that skilled Exploratory Testers would use the Hexawise-generated test ideas (which they would not want to be overly-detailed), and go “off script” to explore interesting test ideas that they cooked up in real time as they were doing their testing. So skilled Exploratory Testers would be able to find defects (presumably including serious ones) that the written test cases, regardless of whether they were manually created or created by Hexawise, would not lead them to directly. That’s an important topic for another time. I’ll be talking about Exploratory Combinatorial Testing at the Conference of the Association of Software Testing – CAST – this year in my home town of Madison, Wisconsin. Since you’re also going, perhaps we could collaborate and you could share your experiences (good or bad) with the attendees. I’ll happily give you 10 minutes of my speaking time to share your experiences if you’d like.

PK: Are you only counting functional defects ?

JH: Not explicitly. The directions I give to teams running these pilot projects is. Try to answer the 3 questions above. Report defects. We’re not after a count of “failed test cases.” We’re after a number of defects. Having said that, as you might suspect, most of the defects reported tend to be functional defects.

PK: What about all the other types of defect ?

JH: They’re not reported as often but we count them too. In situations where one tester reports a “hard to spot” bug (e.g., one that might take a more experienced tester to identify), it raises the possibility that the bug is being reported not because one set of tests is superior to the other but because one tester is better than the other. Accordingly, in an effort to keep an apples to apples comparison, we talk with the tester and try to determine with the tester’s input whether the tester would have found that same defect with the other set of tests. If the answer is yes, we’d report the defect as “found” in both sets of tests. This doesn’t happen as often as you might suspect it would.

PK: Does the project type matter ?

JH: Yes. Benefits tend to be relatively smaller when there are a disproportionately high percentage of small, discrete, one-off tests. And higher when there are more than 5 parameters that interact in meaningful ways. And easier to capture when the System Under Test does not have a lot of conditional branching logic.

PK: How about the devs they are working with and the practices they follow ?

JH: I don’t have enough empirical evidence to say definitively. It’s used successfully by thousands of testers in waterfall projects and thousands of testers in Agile projects.

PK: What about the experience of the tester, does that make a difference?

JH: Even more important than experience level of the tester are, in order, (1) analytical ability, (2) willingness to try new things, and (3) willingness to ask questions. By my estimates about 50% of the testers I come across at our clients (almost all at Fortune 2000 firms) would not be able to design excellent sets of pairwise tests from scratch. This is because above average analytical ability is required for testers to select parameters and values from their Systems Under Test in a thoughtful way. Getting back to experience level, some of our strongest users at our clients are straight of out of college. They start work, get exposed to Hexawise, “get it” and don’t look back. Interestingly, some testers who have been testing for, say, 10 years or more – while experienced – sometimes seem to be too set in their ways to embrace this rather different approach to designing tests.

PK: If they are working with top of the range developers (as some lucky testers are cough cough) then there aren’t that many functional bugs to be found and you’re looking at browser compatibility, usability, race conditions – is combinatorial testing going to find these more quickly ?

JH: Yes. Absolutely. If you’d like to collaborate to test that and help gather empirical evidence that you could share at CAST, I would be happy to work with you to do just that. If your experience contradicts what I’m saying here, you’d have the floor to tell CAST participants what your actual experience was.

PK: I read the study in the link – 97% of defects could be found by pairwise combinatorial testing ? Really ? ALL types of defects ? Really ? How can pairwise find a defect caused by a missing or ambiguous or inconsistent requirement, or a performance or security ?

JH: The statistics I quote are a lot lower than that. The pie chart I use averages out several studies done by PhD’s that have found, on average, 84% of defects could be triggered by testing for all combinations of 2 test inputs. The 97% figure is eyebrow-raising on its own (regardless of industry). Given that it was in the medical device industry in the United States (one of the most litigious area in the history of the world?), that statistic is particularly mind-boggling. What the PhDs in that study did was take a look at all of the medical devices that had been taken off of the market in the United States as a result of software defects. Then they investigated how many test inputs would be required to trigger each of those defects. They authors of that study found that an astonishing 97% of those defects could have been triggered by just 2 test inputs.

PK: Love your passion and enthusiasm and I do have a beta of Hexawise to see if it can do anything for my productivity – and I might agree that there is a lack of empirical studies, not just among the testing community but the s/w community as a whole into the effectiveness or not of how software is produced

JH: Thanks. I hope you have positive experiences with using Hexawise and I’m happy to help you if ever have any questions about using Hexawise on your projects.

By: Justin Hunter on Mar 25, 2013

Categories: Combinatorial Software Testing, Efficiency, Pairwise Software Testing, Software Testing Efficiency, Testing Strategies

A combinatorial explosion is when the configuration settings and user actions and data entered etc. makes it impossible to test everything. The number of tests required to individually test every single possibility is many thousands of times greater than could realistically be tested.

When faced with taking over an existing software applications without a good test suite (or any test plan) often is daunting. And the problems of creating an unfathomable number of tests face you due to combinatorial explosion. Hexawise is a software as a service that aids in dealing with this dilemma for software testers. Software test plans are created that provide far better coverage than is seen in practice with a tiny fraction of the test required for complete combinatorial coverage (that is testing every possible combination [pairwise or 3, 4, 5... way] individually).

The Google Maps test plan provides a good example of combinatorial explosion faced by the testers (in this case, those who tested Google Maps). Take a look at the Google Maps test plan by login to your Hexawise account (creating a demo account is free and simple). The Google Maps test plan is one of 9 samples currently provided in Hexawise.

For creating your own test plan, while you are exploring the software application and testing it out to find "where the weak points are," you will probably find it useful to vary things as much as possible, repeat your actions as little as possible. Those points are true whether you're doing relatively informal lightly documented exploratory testing or more heavily documented test scripts. It addition, since a large percentage of defects can be triggered by the interaction of just two test inputs, it would be nice, if you had time, to test every single possible combination involving two test inputs; that's the rationale behind allpairs, pairwise and orthogonal array-based test case prioritization methods.

To recreate a similar - very early draft - plan for yourself, I'd suggest going through the following steps to put together a relatively small number of highly informative end-to-end-ish tests:

Ask what can change as users go through the system? Think about configuration settings, user actions, data formats, data ranges, etc. even throw in more "creative" ideas like user personas. Let your creativity and common sense guide you. Enter those in as parameters.

Ask how those parameters can change? (for the parameter "Browser" enter IE7, IE8, FF, etc.) Put those in as values under each parameter (entering constraints as required)

Ask does that variation matter? When possible (when it doesn't matter as much) use equivalence classes and be biased towards fewer values - at least for your early draft tests.

Ask what special paths thorough the system do you want to be sure to include? (Most common happy path, paths to trigger certain business rules, etc.)

Click the Create Tests button in Hexawise and you'll instantly get a very nice draft starter set of highly varied tests. If they look like they're relatively interesting and don't miss hugely important things, start informally executing them and you'll be sure to learn some more things as you do about the system's weak points that would result in you going back to those draft tests and iterating them to make them stronger and cover more.

To get a bit more on using this approach see our case studies. Hexawise TV provides narrated videos online showing how to make your life easier as a software tester.

 

Related: Hexawise Tip: Using Value Expansions and Value Pairs to Handle Dependent Values - 3 Strategies to Maximize Effectiveness of Your Tests - Pairwise and Combinatorial Software Testing in Agile Projects

By: John Hunter and Justin Hunter on Jan 2, 2013

Categories: Combinatorial Software Testing, Exploratory Testing, Multi-variate Testing, Pairwise Software Testing, Software Testing, Testing Strategies

It's common to have a test plan where the possible values of one parameter depend on the value of another parameter. There are many options for how you can represent this scenario in Hexawise, some options that involve using value expansions (when there is equivalence) and other options that do not use value expansions (when there is not equivalence).

Using Value Expansions in Hexawise

The general rule of thumb for value expansions is that they are for setting up equivalence classes. The key there being the equivalence. The expectations of the system should be the same for every value listed in that particular value expansion.

Let's consider a real world example involving a classification parameter with a value that is dependent on the value of a role parameter:

Inputs
Role: Student, Staff
Classification: Freshman, Sophomore, Junior, Senior, Adjunct, Assistant, Professor, Administrator

So if the Role parameter has a value of Student, then the Classification parameter must have a value of Freshman, Sophomore, Junior or Senior, but if the Role parameter has a value of Staff, then the Classification parameter must have a value of Adjunct, Assistant, Professor or Administrator.

Using value expansions in this case might be a good option. You could setup your inputs, value expansions and value pairs this way:

Inputs
Role: Student, Staff
Classification: Student Classification, Staff Classification

Value Expansion
Student Classification: Freshman, Sophomore, Junior, Senior
Staff Classification: Adjunct, Assistant, Professor, Administrator

Value Pairs
When Role=Student Always Classification=Student Classification
When Role=Staff Always Classification=Staff Classification

You would use this approach if there were no important differences in the business logic or expected behavior of the system when the different expansions of the value were used. If Freshman versus Sophomore is an important label for the users to be able to enter and see, but the system under test doesn't change its behavior based on which value is selected, then those expansions of the value are equivalent and don't need to be tested individually for how they might interact with other parts of the system and create bugs. If this equivalence scenario is true, then you will greatly simplify things for yourself and create fewer tests that are just as powerful by using value expansions.

In the scenario that would support using value expansions, the system might have different behavior for a Junior versus an Adjunct Professor, but not for a Freshman versus a Senior. A Freshman and a Senior are always equivalent in the system, so they can be combined in a value expansion.

However, if the expectations are not the same, then a value expansion should not be used. For example, let's suppose this hypothetical system has business logic giving priority class scheduling to Seniors and only last available scheduling priority to Administrators. In this case, using value expansions as described above would probably be a mistake. Why? Because a Sophomore and a Senior aren't treated the same way by the system, yet Hexawise considers all the expansions of the Student Classification value as equivalent. As long as you've got a test that has paired a value expansion of the Student Classification value with the Overbooked value of the Class Status parameter, then Hexawise won't insist on pairing all the other value expansions for the Student Classification value with Class Status = Overbooked in other tests. You could therefore miss a bug that only occurs when a Senior signs up for an overbooked class.

"One to many" or "multi-valued" married pair model

If the system under test does not consider the values to be equivalent and has requirements and business logic to behave differently, then by using value expansions to signal equivalency to Hexawise when there isn't equivalency is probably a mistake.

So what would you do in that case?

We've decided that it might be nice to be able to set up your inputs and value pairs like this:

Inputs
Role: Student, Staff
Classification: Freshman, Sophomore, Junior, Senior, Adjunct, Assistant, Professor, Administrator

Value Pairs
When Role=Student Always Classification=Freshman, Sophomore, Junior, or Senior When Role=Staff Always Classification=Adjunct, Assistant, Professor, or Administrator

Unfortunately, this kind of a "one to many" or "multi-valued" value pair is something we've only recently realized would be very helpful, and is something we have on the drawing board for Hexawise in the intermediate future, but is not a feature of Hexawise today. In the meantime, you could model it with three parameters:

Inputs
Role: Student, Staff
Student Classification: Freshman, Sophomore, Junior, Senior, N/A Staff Classification: Adjunct, Assistant, Professor, Administrator, N/A

Value Pairs
When Role=Student Always Staff Classification=N/A
When Role=Staff Always Student Classification=N/A

Another modeling option to consider, if there is only special logic for Administrator and for Seniors, but the rest of the values we've been discussing are equivalent, is to use value expansions for just the equivalent values:

Inputs
Role: Student, Staff
Classification: Underclassman, Senior, Professor, Administrator

Value Expansions
Underclassman: Freshman, Sophomore, Junior Professor: Adjunct, Assistant, Full

Value Pairs
When Role=Student Never Classification=Professor When Role=Student Never Classification=Administrator When Role=Staff Never Classification=Underclassman When Role=Staff Never Classification=Senior

I hope this helps you understand the role of value expansions in Hexawise, when to use them (in cases of equivalency), and when to avoid them, and how value pairs and value expansions can be used together to handle cases of dependent parameter values. Value Expansions are a powerful tool to help you decrease the number of tests you need to execute, so take advantage of them, and if you have any questions, just let us know!

By: John Hunter on Apr 26, 2012

Categories: Combinatorial Software Testing, Combinatorial Testing, Hexawise tips, Pairwise Testing, Software Testing, Testing Strategies

84 percent coverage in 20 tests

Hexawise test coverage graph showing 83.9% coverage in just 20 tests

 

Among the many benefits Hexawise provides is creating a test plan that maximizes test coverage with each new scenario tested. The graph above shows that after just 20 test 83.9% of the test combinations have been tested. Read more about this in our case study of a mortgage application software test plan. Just 48 test combinations are needed to test for every valid pair (3.7 million possible tests combinations exist in this case). If you are lost now, this video may help.

The coverage achieved by the first few tests in the plan will be quite high (and the graph line will point up sharply) then the slope will decrease in the middle of the plan (because each new test will tend to test fewer net new pairs of values for the first time) and then at the end of the plan the line will flatten out quite a lot (because by the end, relatively few pairs of values will be tested together for the first time).

One of the benefits Hexawise provides is making that slope as steep as possible. The steeper the slope the more efficient your test plan is. If you repeat the same tests of pairs and triples and... while not taking advantage of the chance to test, untested pairs and triples you will have to create and run far more test than if you intelligently create a test plan. With many interactions to test it is far too complex to manually derive an intelligent test plan. A combinatorial testing tool, like Hexawise, that maximizes test plan efficiency is needed.

For any set of test inputs, there is a finite number of pairs of values that could be tested together (that can be quite a large number). The coverage chart answers, after each tests, what percentage of the total number of pairs (or triples, etc.) that could be tested together have been tested together so far?

The Hexawise algorithms achieve the following objectives that help testers find as many defects as possible in as few tests as possible. In each and every step of each and every test case, the algorithm chooses a test condition that will maximize the number of pairs that can be covered for the first time in the test case. (Or, the maximum number of triplets or quadruplets, etc. based on the thoroughness setting defined by the user). Allpairs (AKA pairwise) is a well known and easy to understand test design strategy. Hexawise lets users create pairwise sets of tests that will test not only every pair but it also allows test designers to generate far more thorough sets of tests (3-way to 6-way coverage). This allows users to "turn up the coverage dial" and generate tests that cover every single possible triplet of test inputs together at least once (or every 4-way combination or 5-way combination or 6-way combination).

Note that the coverage ratio Hexawise shows is based on the factors entered as items to be tested: not a code coverage percentage. Hexawise sorts the test plan to front load the coverage of the tuple pairs, not the coverage of the code paths. Coverage of code paths ultimately depends on how good a job the test designer did at extracting the relevant parameters and values of the system under test. You would expect there to be some loose correlation between coverage of identified tuple pairs and coverage of code paths in most typical systems.

If you want to learn more about these concepts, I would recommend Scott's Scott Sehlhorst articles on pairwise and combinatorial test design. They are some of the clearest introductory articles about pairwise and combinatorial testing that I have seen. They also contain some interesting data points related to the correlation between 2-way / allpairs / pairwise / n-way coverage (in Hexawise) and the white box metrics of branch coverage, block coverage and code coverage (not measurable by Hexawise).

In Software testing series: Pairwise testing, for example, Scott includes these data points:

 

  • We measured the coverage of combinatorial design test sets for 10 Unix commands: basename, cb, comm, crypt, sleep, sort, touch, tty, uniq, and wc... The pairwise tests gave over 90 percent block coverage.

 

  • Our initial trial of this was on a subset Nortel’s internal e-mail system where we able cover 97% of branches with less than 100 valid and invalid testcases, as opposed to 27 trillion exhaustive testcases.

 

  • A set of 29 pair-wise... tests gave 90% block coverage for the UNIX sort command. We also compared pair-wise testing with random input testing and found that pair-wise testing gave better coverage.

 

Related: Why isn't Software Testing Performed as Efficiently and Effecively as it could be? - Video Highlight Reel of Hexawise – a pairwise testing tool and combinatorial testing tool - Combinatorial Testing, The Quadrant of Massive Efficiency Gains

Specific guidance on how to view the percentage of coverage graph for the test plan in Hexawise:

 

When working on your test plan in Hexawise, to get the checklist to be visible, click on the two downward arrow keys located shown in the image:

How-To Progress Checklists-2 inline

Then you'll want to open up the "Advanced" list. So you might need to click here:

Advanced How-To Progress Checklist inline

Then the detailed explanation will begin when you click on "Analyze Tests"

Decreasing Marginal Returns inline

 

This post is adapted (and some new content added) from comments posted by Justin Hunter and Sean Johnson.

By: John Hunter on Feb 3, 2012

Categories: Combinatorial Software Testing, Combinatorial Testing, Efficiency, Multi-variate Testing, Pairwise Software Testing, Pairwise Testing, Scripted Software Testing, Software Testing, Software Testing Efficiency

Combinatorial Software Test Design - Beyond Pairwise Testing

 

I put this together to explain combinatorial software test design methods in an accessible manner. I hope you enjoy it and that, if you do, that you'll consider trying to create test cases for your next testing project (whether you choose our Hexawise test case generator or some other test design tool).

 

Where I'm Coming From

As those of you know who read my posts, read my articles, and/or have attended my testing conference presentations, I am a passionate proponent of these approaches to software test design that maximize variation from test case to test case and minimize repetition. It's not much of an exaggeration to say I hardly write or talk publicly about any other software testing-related topics. My own consistent experiences and formal studies indicate that pairwise, orthogonal array-based, and combinatorial test design approaches often lead to a doubling of tester productivity (as measured in defects found per tester hour) as compared to the far more prevalent practice in the software testing industry of selecting and documenting test cases by hand. How is it possible that this approach generates such a dramatic increase in productivity? What is so different between the manually-selected test cases and the pair-wise or combinatorial testing cases? Why isn't this test design technique far more broadly adopted than it is?

 

A Common Challenge to Understanding: Complicated, Wonky Explanation

My suspicion is that a significant reason that combinatorial software testing methods are not much more widely adopted is that many of the articles describing it are simply too complex and/or too abstract for many testers to understand and apply. Such articles say things like:

A. Mathematical Model

 

A pairwise test suite is a t-way interaction test suite where t = 2. A t-way interaction test suite is a mathematical structure, called a covering array.

Definition 1 A covering array, CA(N; t, k, |v|), is an N × k array from a set, v, of values (symbols) such that every N × t subarray contains all tuples of size t (t-tuples) from the |v| values at least once [8].

The strength of a covering array is t, which defines, for example, 2-way (pairwise) or 3-way interaction test suite. The k columns of this array are called factors, where each factor has |v| values. In general, most software systems do not have the same number of values for each factor. A more general structure can be defined that allows variability of |v|.

Definition 2 A mixed level covering array, MCA (N; t, k, (|v1|,|v2|,..., |vk|)), is an N × k array on |v| values, where

| v |␣ ␣k | vi | , with the following properties: (1) Each i␣1

column i (1 ␣ i ␣ k) contains only elements from a set Si of size |vi|. (2) The rows of each N × t subarray cover all t-tuples of values from the t columns at least once.

  • "Construct Pairwise Test Suites Based on the Bak-Sneppen Model of Biological Evolution" World Academy of Science, Engineering and Technology 59 2009 - Jianjun Yuan, Changjun Jiang

 

If you're a typical software tester, even one motivated to try new methods to improve your skills, you could be forgiven for not mustering up the enthusiasm to read such articles. The relevancy, the power, and the applicability of combinatorial testing - not to mention that this test design method can often double your software testing efficiency and increase the thoroughness of your software testing - all tend to get lost in the abstract, academic, wonky explanations that are typically used to describe combinatorial testing. Unfortunately for pragmatic, action-oriented software testing practitioners, many of the readily accessible articles on pairwise testing and combinatorial testing tend to be on the wonky end of the spectrum; an exception to that general rule are the good, practitioner-oriented introductory articles available at combinatorialtesting.com.

 

A Different Approach to Explaining Combinatorial Testing and Pairwise Testing

In the photograph-rich, numbers-light, presentation embedded above, I've tried to explain what combinatorial testing is all about without the wonky-ness. The benefits from structured variation and from using combinatorial test design is, in my view, wildly under-appreciated. It has the following extremely important benefits:

  • Less repetition from test case to test case

    • In the context of discussing testing's "pesticide paradox" James Bach, I believe, used the analogy that following in someone's footsteps is a very good way to survive traversing through a mine field but a generally lousy way to find software defects efficiently.
    • Maximizing variation from test case to test case, as a general rule, is an absolutely spectacular way to find defects quickly.
    • There are thousands, if not trillions of relevant combinations to select from when identifying test cases to execute; computer algorithms will be able to solve the problem of "how can maximum variation be achieved?" far better than human brains can.
  • More coverage of combinations of test inputs

    • Most of the time, since awareness of pairwise and combinatorial testing methods remain low in the software testing community, combining all possible pairs of values in at least one test case is not even a conscious goal of testers.
    • Even if this were a goal of their test design strategy, testers would have a tremendous challenge in trying to achieve such a goal: with hundreds, thousands or tens of thousands of targeted combinations to cover, losing track of a significant number of them and/or forgetting to include them in software tests is virtually a foregone conclusion unless a test case generator is used.
    • More thorough coverage leads to more defects being found.
  • Efficiency (Testers can "turn the coverage dial" to achieve maximum efficiency with a minimal number of tests)

    • The efficiency and effectiveness benefits of pairwise testing have been demonstrated in testing projects every major industry.
    • I wanted to prominently include the message that testers using test case generators have the option to dramatically increase the testing thoroughness levels of the tests they generate because it is a topic that often gets ignored in introductions to pairwise testing case studies and introductions
  • Thoroughness - (Testers can also "turn the coverage dial" to achieve maximum thoroughness if that is their goal)

    • Too often, tester's view pairwise as a technique that focuses on a very small number of curiously strong tests; that is only part of the story.
    • This can lead to the /false/ impression that combinatorial testing methods are inappropriate where high levels of testing thoroughness are required.
    • You can create very different sets of tests that are as thorough as possible (given your understanding of what you are testing) no matter whether you have 1 hour to execute tests or one month to test.

 

Other Recommended Sources of Information on Pairwise and Combinatorial Testing:

By: Justin Hunter on Oct 7, 2010

Categories: Combinatorial Software Testing, Combinatorial Testing, Design of Experiments, Hexawise test case generating tool, Multi-variate Testing, Pairwise Software Testing, Pairwise Testing, Recommended Tool, Testing Strategies, Uncategorized

A friend passed me this set of recent tweets from Wil Shipley, a Mac developer with 11,743 followers on Twitter as of today. Wil recently encountered the familiar problem of what to do when you've got more software tests to run than you can realistically execute.

 

20100623-nixnbwu9urxaufu6hjt2143j3h

I love that. Who can't relate?

Now if only there were a good, quick way to reduce the number of tests from over a billion to a smaller, much more manageable set of tests that were "Altoid-like" in their curious strength. :) I rarely use this blog for shameless plugs of our test case generating tool, but I can't help myself here. The opening is just too inviting. So here goes:

 

"Wil,

There's an app for that... See www.hexawise.com for Hexawise, a "pairwise software test case generating tool on steroids." It eats problems like the one you encountered for breakfast. Hexawise winnows bazillions of possible test cases down in the blink of an eye to small, manageable sets of test cases that are carefully constructed to maximize coverage in the smallest amount of tests, with flexibility to adjust the solutions based upon the execution time you have available. In addition to generating pairwise testing solutions, Hexawise also generates more thorough applied statistics-based "combinatorial software testing" solutions that include tests for, say, all possible 6-way combinations of test inputs.

Where your Mac cops an attitude and tells you "Bitch, I ain't even allocating 1 billion integers to hold your results" and showers you with taunting derisive sneers, head-waggling and snaps all carefully choreographed to let you know where you stand, Hexawise, in contrast, would helpfully tell you: "Only 1 billion total possibilities to select tests from? Pfft! Child's play. Want to start testing the 100 or so most powerful tests? Want to execute an extremely thorough set of 10,000 tests? Want to select a thoroughness setting in the middle? Your wish is my command, sir. You tell me approximately how many tests you want to run and the test inputs you want to include, and I'll calculate the most powerful set of tests you can execute (based on proven applied statistics-based Design of Experiments methods) before you can say "I'm Wil Shipley and I like my TED Conference swag."

More info at:
http://hexawise.tv/intro/
or
https://hexawise.com/Hexawise_Introduction.pdf
free trials at:
http://hexawise.com/signup

By: Justin Hunter on Jun 23, 2010

Categories: Combinatorial Software Testing, Combinatorial Testing, Interesting People , Pairwise Software Testing, Pairwise Testing, Recommended Tool, Software Testing