a1ValueOfChecklists-1pdfpage46of95

 

I highly recommend this presentation by Cem Kaner (available here as a pdf download of slides). It is provocative, funny, and insightful. In it, Cem Kaner makes a strong case for using checklists (and mercilessly derides many aspects of using completely scripted tests). Cem Kaner, as I suspect most people reading this already know, is one of the leading lights of software testing education. He is a professor of computer sciences at Florida Institute of Technology and has contributed enormously to software testing education by writing Testing Computer Software "the best selling software testing book of all time," founding the Center for Software Testing Education & Research](http://www.testingeducation.org/BBST/), and making an excellent free course available online for [Black Box Software Testing. .

Here are a couple of my favorite slides from the presentation.

 

a2ValueOfChecklists-1pdfpage82of95

 

a1ValueOfChecklists-2pdfpage83of95

 

My own belief is that the presentation is very good and makes the points it wants to quite well. If I have a minor quibble with it, it is that in doing such a good job at laying out the case for checklists and against scripted testing, the presentation - almost by definition/design - does not go into as much detail as I would personally like to see about a topic that I think is extremely important and not written about enough; namely, how practitioners should use an approach that blends the advantages of scripted tests (that can generate some of the huge efficiency benefits of combinatorial testing methods for example) and checklist-based Exploratory Testing (which have the advantages pointed out so well in the presentation). A "both / and" option is not only possible; it is desirable.


Credit for bringing this presentation to my attention: Michael Bolton ([the testing expert](http://www.developsense.com/blog.html, of course, not the singer, ["Office Space" video snippet] , posted a link to this presentation. Thanks again, Michael. Your enthusiastic recommendation to pick up boxed sets of the BBC show Connections was also excellent as well; the presenter of Connections is like a slightly tipsy genius with ADHD who possesses incredible grasp of history, an encyclopedic knowledge of quirky scientific developments and a gift for story-telling. I like how your mind works.

By: Justin Hunter on Nov 4, 2009

Categories: Scripted Software Testing, Software Testing Efficiency, Software Testing Presentations, Testing Case Studies, Testing Checklists, Testing Strategies

On October 6th, I informally launched testing.stackexchange.com as "the stackoverflow.com for Software Testing" without much hoopla. So far, less than a month later, with no advertising other than word of mouth, the initial results are very promising. We've had approximately:

  • 70 new users join as members and contributors
  • 50 software testing questions
  • 160 answers to those questions
  • 2,200 views of the questions and answers

The most important development is not reflected in the numbers above. More important, by far, than the number of the participants have joined is the quality of people who are contributing. Members of the forum include some prominent experts including: Jason Huggins (creator of Selenium and cofounder of Sauce Labs, Alan Page and Bj Rollison (of "How we Test Software at Microsoft" fame), Michael Bolton (the testing expert, not the singer), Fred Beringer, Elisabeth Hendrickson, Joe Strazzere, Adam Goucher, Simon Morley, Rob Lambert, Scott Sehlhorst, etc. etc.). Given the high quality people the site has attracted, the quality of the answers delivered has been quite high. Perhaps the quality is also above average because people answering know that their answers will be analyzed by thoughtful testers and voted up (or down) based on how good they are. In short, testers are asking good questions and getting them answered which is why I created the site in the first place. I'm cautiously optimistic about the future: if the site

Members so far include:

 

1Users-TestingStackExchange

 

The most viewed questions so far include:

 

1TestingStackExchange-2

 

The most recent questions being asked and answered are:

 

1TestingStackExchangerecentqs-1

 

I'd like to extend special thanks to Alan Page (who likes the idea so much that has volunteered to join me as a co-manager/Moderator of the site), to Shmuel Gershon, Jason from NC, and Joe Strazzere for being particularly active and to Alan Page, Corey Goldberg, Shmel Gershon, and Konstantin for helping to get the word out about the form through their blog posts telling the world about testing.stackexchange.com. Without their combined help, we'd be nowhere. With their help and support, we're building a place where software testers can seek and receive high-quality, peer-reviewed answers to their testing questions.

Please help us succeed by spreading the word, asking a few questions, answering a few, and voting on the best answers.

Thanks everyone!

By: Justin Hunter on Nov 2, 2009

Categories: Interesting People , Software Testing

There are some phrases in English that, as often as not, come off sounding obligatory and/or insincere. The phrase "I'm honored..." comes to mind (particularly if someone is accepting an award in front of a room full of people).

Be that as it may, I genuinely felt really honored last night and again today by a couple comments James Bach has said about me, including these:

 

TwitterHexawiseresults-Oct232009

 

Here's the quick background: (1) James knows much more about software testing than I do and I respect his views a lot. (2) He has a reputation for not suffering fools gladly and pretty bluntly telling people he doesn't respect them if he doesn't respect the content of their views. (3) in addition to his extremely broad expertise on "testing in general" James, like Michael Bolton, knows a lot about pairwise and combinatorial testing methods and how to use them. (4) I firmly (and passionately) believe that pairwise and combinatorial testing methods are (a) dramatically under-appreciated, and (b) dramatically under-utilized. (5) James has published a very good and well-reasoned article about some of the limitations of pairwise testing methods that I wanted to talk to him about. (6) I co-wrote an article that IEEE Computer recently published about Combinatorial Testing that I wanted to discuss with him. (7) James and I have been at the STP Conference in Boston over the past few days. (8) I reached out to him and asked to meet at the conference to talk about pairwise and combinatorial testing methods and share with him my findings that - in the dozens of projects I've been involved with that have compared testers efficiency and effectiveness - I've routinely seen defects found per tester hour more than double. (9) I was interested in getting his insights into where are these methods most applicable? Least applicable? What have his experiences been in teaching combinatorial testing methods to students, etc.

In short, frankly, my goals in meeting with him were to: (a) meet someone new, interesting and knowledgeable and learn as much as could and try to understand from his experiences, his impressive critical thinking and his questioning nature, and (b) avoid tripping up with sloppy reasoning (when unapologetically expressing the reasons I feel combinatorial testing methods are dramatically under-appreciated by the software testing community) in front of someone who (i) can smell BS a mile away, and (ii) doesn't suffer fools gladly.

I learned a lot, heard some fantastic war stories and heard his excellent counter-examples that disproved a couple of the generalizations I was making (but didn't dampen my unshaken assertions that combinatorial testing methods are wildly under-utilized by the software testing community). I thoroughly enjoyed the experience. Moving forward, as a result of our meeting, I will go through an exercise which will make me more effective (namely carefully thinking through and enumerating all of the assumptions behind my statements like: "I've measured the effectiveness of testers dozens of times - trying to control external variables as much as reasonably possible - and I'm consistently seeing more than twice as many defects per tester hour when testers adopt pairwise/combinatorial testing methods."

His complement last night was private so I won't share it but it ranks up there in my all time favorite complements I've ever received. I'm honored. Thanks James.

By: Justin Hunter on Oct 23, 2009

Categories: Combinatorial Testing, Design of Experiments, Efficiency, Interesting People , Pairwise Testing, Software Testing, Software Testing Efficiency, Testing Case Studies, Uncategorized

I have just created the first video overview of the Hexawise test case generator. Please take a look and let me know your thoughts (either with an email or a comment below).

 

Introduction to Hexawise Pairwise Testing Tool / Combinatorial Testing Tool

 

I'll refine and hopefully improve it over time, but wanted to share it at this point for feedback. I'd welcome feedback. Is the pace of the video too slow? Does it have too much detail about pairwise coverage? Does the fact that I've got a dull Midwestern, nasal, monotone mean I should have someone with a more animated and melodious "voice made for radio" do the voice over?

Thanks in advance for your feedback!

By: Justin Hunter on Oct 20, 2009

Categories: Uncategorized

matt-heusser

 

Matthew Heusser, an accomplished tester, frequent blogger, and insightful contributor in the Context Driven Testing mailing list, and a testing expert whose opinion I respect a lot, has just published a very thought-provoking blog post that highlights an important issue surrounding "PowerPointy" consultants in the testing industry who have relatively weak real world testing chops. It's called "[The Fishing Maturity Model](http://blogs.stpcollaborative.com/matt/2009/10/08/the-fishing-maturity-model/."

Matthew argues that testers are well-advised to be skeptical of self-described testing experts who claim to "have the answer" - particularly when such "experts" haven't actually rolled their sleeves up and done software testing themselves. In reading his article, I found it quite thought-provoking, particularly because it hit close to home: while I'm by no means a testing expert in the broader sense of the term, I do consider myself to know enough about combinatorial test design strategies applicable to software testing to be able to help most testing teams become demonstrably more efficient and effective... and yet, my actual hands-on testing experiences are admittedly quite limited. If I'm not one of the guys he's (justifiably) skewering with his funny and well-reasoned post (and he assures me I'm not; see below), a tester could certainly be forgiven for mistaking me for one based on my past experiences.

Matthew's Five Levels of the Fishing Maturity Model (based, not so loosely, of course on the Testing Maturity Model, not to mention CMM, and CMMi)...

 

The five levels of the fishing maturity model: 1 – Ad-hoc. Fishing is an improvised process. 2 – Planned. The location and timing of our ships is planned. With a knowledge of how we did for the past two weeks, knowing we will go to the same places, we can predict our shrimp intake. 3 – Managed. If we can take the shrimp fishing process and create standard processes – how fast to drive the boat, and how deep to let out the nets, how quickly, etc, we can improve our estimates over time, more importantly. 4 – Measured. We track our results over time – to know exactly how many pounds of shrimp are delivered at what time with what processes. 5 – Optimizing. At level 5, we experiment with different techniques; to see what gathers more shrimp and what does not. This leads us to continual improvement.

 

Sounds good, right? Why, with a little work, this would make a decent 1-hour conference presentation. We could write a little book, create a certification, start running conferences …

 

And the rub...

The problem: I’ve never fished with nets in my entire life. In fact, the last time I fished with a pole, I was ten years old at Webelo’s camp.

I posted the following response, based on my personal experiences: Words in [brackets] are Matthew's response to me.

Matthew,

Excellent post, as usual. [I'm glad you like it. Thank you.]

You raise very good points. Testers (and other IT executives) should be leery of snake oil salesmen and use their judgment about “experts” who lack practical hands-on experience. While I completely agree with this point, I offer up my own experiences as a “counter-example” to the problem you pointed out here.

3-4 years ago, while I was working at a management consulting and IT company, (with a personal background as an entrepreneur, lawyer, and management consultant – and not in software testing), I began to recommend to any software testers who would listen, that they start using a different approach to how they designed their test cases. Specifically, I was recommending that testers should begin using applied statisitics-based methods* designed to maximize coverage in a small number of tests rather than continuing to manually select test cases and rely on SME’s to identify combinations of values that should be tested together. You could say, I was recommending that they adopt what I consider to be (in many contexts) a “more mature” test design process.

The reaction I got from many teams was, as you say “this whole thing smells fishy to me” (or some more polite version of the rebuttal “Why in the world should I, with my years of experience in software testing, listen to you – a non-software tester?”) Here’s the thing: when teams did use the applied statistics-based testing methods I recommended, they consistently saw large time reductions in how long it took them to identify and document tests (often 30-40%) and they often saw huge leaps in productivity (e.g., often finding more than twice as many defects per tester hour). In each proof of concept pilot, we measured these carefully by having two separate teams – one using “business as usual” methods, the other using pairwise or orthogonal array-based test design strategies – test the same application. Those dramatic results led to my decision to create [Hexawise](http://www.hexawise.com/users/new, a software test design tool. [Point Taken ...]

My closing thoughts related to your post boils down to:

  1. I agree with your comment – “There are a lot of bogus ideas in software development.”

  2. I agree that testers shouldn’t accept fancy PowerPointed ideas like “this new, improved method/model/tool will solve all your problems.”

  3. I agree that testers should be especially skeptical when the person presenting those PowerPointed slides hasn’t rolled up their sleeves for years as a software testing practitioner.

Even so…

  1. Some consultants who lack software testing experience actually are capable of making valuable recommendations to software testers about how they can improve their efficiency and effectiveness. It would be a mistake to write them off as charlatans because of their lack of software testing experience. [I agree with the sentiment that sometimes, people out of the field can provide insight. I even hinted at that with the comment that at least, Forrest should listen, then use his discernment on what to use. I'm not entirely ready to, as the expression goes, throw the baby out with the bathwater.]

  2. Following the “bogus ideas” link above takes readers to your quote that: “When someone tells you that your organization has to do something ‘to stay competitive,’ but he or she can’t provide any direct link to sales, revenue, reduced expenses, or some other kind of money, be leery.” I enthusiastically agree. In the software testing community, in my view, we do not focus enough on gathering real data** about which approaches work (or -ideally- in what contexts they work). A more data-driven management approach would help everyone understand what methods and approaches deliver real, tangible benefits in a wide variety of contexts vs. those methods and approaches that look good on paper but fall short in real-world implementations. [Hey man, you can back up your statements with evidence, and you're not afraid to roll up your sleeves and enter an argument. I may not always agree with you, but you're exactly the kind of person I want to surround myself with, to keep each other sharp. Thank you for the thoughtful and well reasoned comment.]

-Justin

 

Company – http://www.hexawise.com
Blog – http://hexawise.wordpress.com
Forum – http://testing.stackexchange.com

 

*I use the term “applied statistics-based testing” to incorporate pairwise, orthogonal array-based, and more comprehensive combinatorial test design methods such as n-wise testing (that can capture, for example, all possible valid combinations involving 6-values).

**Here is an article I co-wrote which provides some solid data that applied statistics-based testing methods can more than double the number of defects found per tester hour (and simultaneously result in finding more defects) as compared to testing that relies on "business as usual" methods during the test case identification phase.

By: Justin Hunter on Oct 12, 2009

Categories: CMM, Combinatorial Testing, Pairwise Testing, Software Testing, Software Testing Efficiency, Testing Maturity Model

Today I've released a beta version of testing.stackexchange.com which is a "stackoverflow.com for software testers." I would appreciate your help in contributing content, and/or getting the word out. Stackoverflow has become an extraordinarily useful forum for software developers to ask difficult, practical questions, and get quick, actionable, peer-reviewed responses from software developers around the globe. While there are some software testing questions on stackoverflow itself, the questions are mostly software developer-centric. There's no reason why we can't create a very similar forum geared primarily towards the software testing community. So who's with me? Please show your support by posting a question, sharing an answer or voting on existing answers at [testing.stackexchange.com](http://testing.stackexchange.com/

If you share my belief in the significant potential benefit to the software testing community that would result from a mature, well-trafficked site with a rich collection of peer-reviewed questions on software testing and you would be interested in helping out beyond posting periodic questions and/or answers to the site, please post a reply here or contact me through Linkedin. I'd love to brainstorm ideas and work with like-minded people to get this forum created for the software testing community. As of now, the odds are against testing.stackexchange from growing to obtain the critical mass it needs (particularly since I'm busy day-to-day building my software testing tool company); a small number of active collaborators would improve the odds dramatically.

I first found out about stackoverflow.com through my brother's blog here

 

Joel Spolsky's video is fantastic. He set out to crack the code on:

  • How can you get a useful exchange of information between experts that results in very good questions and answers being actively shared by participants?

  • How can the community encourage visitors to the site to actively participate and share their expertise?

  • How can the site generate a critical mass and utilize Google to drive traffic to the site to make it self-sustaining?

  • How can users (who might not otherwise be able to tell which are the best answers from among multiple answers) tell which answers are in fact the best?

 

In my view, he has succeeded on all of the above counts, which is truly impressive. We're using the identical strategies (and Spolsky's technology) at testing.stackexchange.com. The way Spolsky lays out his vision is impressive. He logically progresses through a graveyard of multiple Q & A sites that have devolved into largely useless forums where inane questions are asked and dubious answers are shared. He then shares how he and his collaborators adjusted the model for Stackoverflow to maximize the value to participants. Their self-described strategy amounts to taking the best ideas they could from multiple different sites and putting them together in stackoverflow (and "using Google as our landing page" as a way to build traffic).

Thank you in advance for helping to get the word out.

 

By: Justin Hunter on Oct 6, 2009

Categories: Software Testing

I enjoyed talking about efficient and effective combination testing strategies (and highlights of a recent empirical study) at yesterday's TISQA meeting together with Lester Bostic of Blue Cross Blue Shield North Carolina, who shared his team's experiences of adopting a combinatorial testing approach. It addresses how tools like Hexawise can help software testers quickly identify the test cases they should execute to find as many defects as possible with as few tests as possible. I wanted to share it now; once I have more time, I will comment on it and highlight some of the good questions, comments, discussion points, and tester experiences that were raised by the attendees.

The presentation focused on combinatorial testing techniques, such as pairwise testing, orthogonal array-based testing methods, and more thorough combination testing strategies (capable of identifying all defects that could be captured by, say, any possible combination of three or four "things" that you've decided to test for (regardless of whether those "things" include features configurations or equivalence class of data or type of user a mix of each).

The middle of the presentation also highlights empirical evidence that shows this method of identifying test cases often has an enormous impact on how quickly software testers are able to identify defects; citing the IEEE Computer article I co-wrote last month on Combinatorial Testing, this approach - on average - led to more than twice as many defects found per tester hour.

The final section of the presentation was delivered by Lester Bostic of Blue Cross Blue Shield and addresses his lessons learned. Lester used Hexawise to reduce 1,356,136,048,589,996,428,428,909,447,344,392,489,476,985,674,792,960 possible tests (that would have been necessary to achieve comprehensive testing of the application he was testing) to only 220 tests that proved to be extremely effective at identifying defects. .

Comments and questions are welcome.

 

By: Justin Hunter on Sep 18, 2009

Categories: Combinatorial Testing, Pairwise Testing, Software Testing, Software Testing Efficiency, Testing Case Studies

lessons from car manufacturing-20090826-1718522

 

Tony Baer from Ovum recently wrote a blog post titled: Software Development is like Manufacturing which included the following quotes.

"More recently, debate has emerged over yet another refinement of agile – Lean Development, which borrows many of the total quality improvement and continuous waste reduction principles of [lean manufacturing](http://www.lean.org/WhatsLean/. Lean is dedicated to elimination of waste, but not at all costs (like Six Sigma. Instead, it is about continuous improvement in quality, which will lead to waste reduction....

In essence, developing software is like making a durable good like a car, appliance, military transport, machine tool, or consumer electronics product.... you are building complex products that are expected to have a long service life, and which may require updates or repairs."

Here are my views: I see valid points on both sides of the debate. Rather than weigh general high-level pro's and cons, though, I would like to zero in on what I see as an important topic that is all-too-often missing from the debate. Specifically, Design of Experiments has been central to Six Sigma, Lean Manufacturing, the Toyota Production System, and Deming's quality improvement approaches, and is equally applicable to software development and testing, yet adoption of Design of Experiments methods in software design and testing remains low. This is unfortunate because significant benefits consistently result in both software development and software testing when Design of Experiments methods are properly implemented.

What are Design of Experiments Methods and Why are they Relevant?

In short, Design of Experiments methods are a proven approach to creating and managing experiments that alter variables intelligently between each test run in a structured way that allows the experimenter to learn as much as possible in as few experiments as possible. From wikipedia: “Design of experiments, or experimental design, (DoE) is the design of all information-gathering exercises where variation is present, whether under the full control of the experimenter or not. Often the experimenter is interested in the effect of some process or intervention (the “treatment”) on some objects (the “experimental units”).”

Design of Experiments methods are an important aspect of Lean Manufacturing, Six Sigma, the Toyota Production System, and other manufacturing-related quality improvement approaches/philosophies. Not only have Design of Experiments methods been very important to all of the above in manufacturing settings, they are also directly relevant to software development. By way of example, W. Edwards Deming, who was extremely influential in quality initiatives in manufacturing in Japan and the U.S. was an applied statistician. He and thousands of other highly respected quality executives in manufacturing, including Box, Juran and Taguchi (and even my dad), have regularly used Design of Experiments methods as a fundamental anchor of quality improvement and QA initiatives and yet relatively few people who write about software development seem to be aware of the existence of Design of Experiments methods.

What Benefits are Delivered in Software Development by Design of Experiments-based Tools?

Application Optimization applications, like Google’s Website Optimizer are a good example of Design of Experiments methods can deliver powerful benefits in the software development process. It allows users to easily vary multiple aspects of web pages (images, descriptions, colors, fonts, colors, logos, etc.) and capture the results of user actions to identify which combinations work the best. A recent YouTube multi-variate experiment (e.g., and experiment created using Design of Experiment methods) shows how they used the simple tool and increased sign-up rates by 15.7%. The experiment involved 1,024 variations.

What Benefits are Delivered in Software Testing by Design of Experiments-based Tools

In addition, software test design tools, like the Hexawise test design tool my company created, enable dramatically more efficient software testing by automatically varying different elements of use cases that are tested in order to achieve an optimal coverage. Users input the things in the application they want to test, push a button and, as in the Google Web Optimizer example, the tool uses DoE algorithms to identify how the tests should be run to maximize efficiency and thoroughness. A recent IEEE Computer article I contributed to, titled "Combinatorial Testing" shows, on average, over the course of 10 separate real-world projects, tester productivity (measured in defects found per tester hour) more than doubled, as compared to the control groups which continued to use their standard manual methods of test case selection: http://tinyurl.com/nhzgaf

Unfortunately, Design of Experiments methods – one of the most powerful methods in Lean Manufacturing, Six Sigma, and the Toyota Production System – are not yet widely adopted in the software development industry. This is unfortunate for two reasons, namely:

  1. Design of Experiments methods will consistently deliver measurable benefits when implemented correctly, and

  2. Sophisticated new tools designed with very straightforward user interfaces make it easier than ever for software developers and testers to begin using these helpful methods.

By: Justin Hunter on Aug 25, 2009

Categories: Agile, Design of Experiments, Efficiency, Lean, Multi-variate Testing, Software Testing, Software Testing Efficiency

I tend to enjoy drinking beers with people who have diverse interests and viewpoints on an oddball assortment of topics. By that criteria, Jerry Brito seems like he'd definitely be a good person to drain a few pints with.

 

new headshot 111101

Jerry Brito

 

He is a senior research fellow at George Mason University. His interests span:

  • "Simple living" (he's the creator of the creator of Unclutterer, a popular blog about personal organization and all things simple living)

  • Legal issues (he researches and publishes about IT and telecom policy, government transparency, the regulatory process, keepin' an eye on "the man" and his Stimulus spending, etc.)

  • Oddball food stuff (he's a contributor to he irreverent food blog Crispy on the Outside)

Last week, thanks to Michael Bolton's tweet, I stumbled upon a list he put together of famous quotes that make just as much sense when you substitute "PowerPoint" for "Power."

While it has virtually nothing to do with combinatorial or pairwise software testing test methods, test design best practices, or Hexawise, I found it amusing and suggested a couple additions to it. My suggested additions to his list include:

  • "It is said that PowerPoint corrupts, but actually it's more true that PowerPoint attracts the corruptible. The sane are usually attracted by other things than PowerPoint." - David Brin

  • "There can never be a complete confidence in a PowerPoint which is excessive." - Cornelius Tacitus

  • "An honest man can feel no pleasure in the exercise of PowerPoint over his fellow citizens." - Thomas Jefferson

  • "PowerPoint is not alluring to pure minds." - Jefferson again

  • "All men having PowerPoint ought to be distrusted to a certain degree." - James Madison

  • "I was going to buy a copy of The PowerPoint of Positive Thinking, and then I thought: What the hell good would that do?" - Ronnie Shakes

  • "PowerPoint tempts even the best of men to take liberties with the truth." - Joseph Sobran

By: Justin Hunter on Aug 25, 2009

Categories: Interesting People

hipp1

 

Jeff Fry recently linked to a fantastic webcast in Controlled Experiments To Test For Bugs In Our Mental Models. I would highly recommend it to anyone without any reservations. Ron Kohavi, of Microsoft Research does a superb job of using interesting real-world examples to explain the benefits of conducting small experiments with web site content and the advantages of making data-driven decisions. The link to the 22-minute video is here.

I firmly believe that the power of applied statistics-based experiments to improve products is dramatically under-appreciated by businesses (and, for that matter, business schools), as well as the software development and software testing communities. Google, Toyota, and Amazon.com come to mind as notable exceptions to this generalization; they "get it". Most firms though still operate, to their detriment, with their heads in the sand and place too much reliance on untested guesswork, even for fundamentally important decisions that would be relatively easy to double-check, refine, and optimize through small applied statistics-based experiments that Kohavi advocates. Few people who understand how to properly conduct such experiments are as articulate and concise as Kohavi. Admittedly, I could be accused of being biased as: (a) I am the son of a prominent applied statistician who passionately promoted broader adoption of such methods by industry and (b) I am the founder of a software testing tools company that uses applied statistics-based methods and algorithms to make our tool work.

Here is a short summary of Kohavi's presentation:

 

Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO

1:00 Amazon: in 2000, Greg Linden wanted to add recommendations in shopping carts during the check out process. The "HiPPO" (meaning the Highest Paid Person's Opinion) was against it thinking that such recommendations would confuse and/or distract people. Amazon, a company with a good culture of experimentation, decided to run a small experiment anyway, "just to get the data" – It was wildly successful and is in widespread use today at Amazon and other firms.

3:00 Dr. Footcare example: Including a coupon code above the total price to be paid had a dramatic impact on abandonment rates.

4:00 "Was this answer useful?" Dramatic differences in user response rates occur when Y/N is replaced with 5 Stars and whether an empty text box is initially shown with either (or whether it is triggered only after a user clicks to give their initial response)

6:00 Sewing machines: experimenting with a sales promotion strategy led to extremely counter-intuitive pricing choice

7:00 "We are really, really bad at understanding what is going to work with customers…"

7:30 "DATA TRUMPS INTUITION" {especially on novel ideas}. Get valuable data through quick, cheap experimentation. "The less the data, the stronger the opinions."

8:00 Overall Evaluation Criteria: "OEC" What will you measure? What are you trying to optimize? (Optimizing for the “customer lifetime value”)

9:00 Analyzing data / looking under the hood is often useful to get meaningful answers as to what really happened and why

10:30 A/B tests are good; more sophisticated multi-variate testing methods are often better

12:00 Some problems: Agreeing upon Overall Evaluation Criteria is hard culturally. People will rarely agree. If there are 10 changes per page, you will need to break things down into smaller experiments.

14:00 Many people are afraid of multiple experiments [e.g., multi-variate experiments or MVE] much more than they should be.

(A/B testing can be as simple as changing a single variable and comparing what happens when it is changed, e.g., A = "web page background = Blue" / B = "web page background = Orange." Multi-variate experiments involve changing multiple variables in each test run which means that people running the tests should be able to efficiently and effectively change the variables in order to ensure not only that each of the variables is tested but also that the each of the variables is tested in conjunction with each of the others because they might interact with one another). My views on this: before software tools made conducting multi-variate experiments (and understanding the results of the experiments) a piece of cake, this fear had some merit; you would need to be able to understand books like this to be able to competently run and analyze such experiments. Today however, many tools, such as Google's Website Optimizer (used for making web sites better at achieving their click through goals, etc.) and Hexawise (used to find defects with fewer test cases) build the complex Design of Experiments-based optimization algorithms into the tool's computation engine and provide the user of the tool with a simple user interface and user experience. In short, in 2009, you don't need a PhD in applied statistics to conduct powerful multi-variate experiments. Everyone can quickly learn how to, and almost all companies should, use these methods to improve the effectiveness of applications, products and/or production methods. Similarly, everyone can quickly learn how to, and almost all companies should, use these methods to dramatically improve the effectiveness of their software testing processes.>

16:00 People do a very bad job at understanding natural variation and are often too quick to jump to conclusions.

17:00 eBay does A/B testing and makes the control group ~1%. Ron Kohavi, the presenter, suggests starting small then quickly ramping up to 50/50 (e.g., 50% of viewers will see version A, 50% will see version B).

19:00 Beware of launching experiments than "do not hurt," there are feature maintenance costs.

20:00 Drive to a data-driven culture. "It makes a huge difference. People who have worked in a data-driven culture really, really love it… At Amazon… we built an optimization system that replaced all the debates that used to happen on Fridays about what gets on the home page with something that is automated."

21:00 Microsoft will be releasing its controlled experiments on the web platform at some point in the future, but probably not in the next year.

21:00 Summary

  1. Listen to your customers because our intuition at assessing new ideas is poor.

  2. Don't let the HiPPO drive decisions; they are likely to be wrong. Instead, let the customer data drive decisions.

  3. Experiment often create a trustworthy system to accelerate innovation.

 

Related: Statistics for Experimenters - Articles on design of experiments

By: Justin Hunter on Aug 18, 2009

Categories: Design of Experiments, Multi-variate Testing, Software Testing