Thursday, May 9, 2013

Acceptance Test Driven Development - are we flogging a dead horse?

Or - Functional Testing Practices I do and Don’t Like and Why

Functional Testing Practices I don’t like

I’m going to start with what I don’t like as it will create more context to what I do like. And it is nicer to end on a happy note. But first of all, let’s get the definitions out of the way.

What exactly is ATDD?

The first thing I don’t like about ATDD is the sheer amount of confusion in the industry about what the term means. Follow these links for examples of what I mean about the confusion around definition
  1. ATDD vs. BDD vs. Specification by Example vs ....
  2. The Sportscar Metaphor: TDD, ATDD, and BDD Explained
  3. ATDD versus BDD and the proper use of a framework
In order to move past this hurdle and on with my argument, here are is definition of the term and some others related to it so we have a common understanding from this point.
Acceptance tests :-  otherwise known as customer tests and previously known as functional tests, are tests that match/replace the acceptance criteria of a story. The concept has it’s roots in Extreme Programming (XP). See http://c2.com/cgi/wiki?FunctionalTest. The idea is that the customer will accept the story when all the acceptance tests pass. Or put another way, a passing suite of acceptance tests constitutes part of the Definition of Done for a story and/or iteration/sprint.
Because acceptance criteria are typically expressed at a high level of user functionality, Automated Acceptance Tests (AAT) are often written using GUI driving testing frameworks e.g. Selenium. These tools are sometimes referred to as Acceptance Testing Tools/Frameworks.
The agile community have long wanted acceptance tests to be written by the customer (hence the newer name of customer tests) and ideally before work starts on the story. Because the acceptance tests are written before the code, this led to the term Acceptance Test Driven Development (ATDD) or Automated Acceptance Test Driven Development (AATDD) because it somewhat follows the pattern of test before code as practiced by Test Driven Development (TDD). I will talk more on TDD later. If you have implemented Scrum as your agile practice read Product Owner in place of customer. You will find me switching between the two phrases in this article.

Product owners didn’t buy into the vision

The reality of where the above vision of acceptance testing has ended up is that most product owners are not able to, or are interested in, writing such tests. Particularly if it requires using a scripting or programming language.
When the agile community witnessed this reluctance, they thought that if they could create functional testing tools that would allow for the tests to be created using applications that product owners are comfortable with, like excel or word, then the practice would become more palatable and widely accepted. This resulted in tools such as FitNesse. I’m going to step out onto the ledge and say that on the most part these tools didn’t work in the manner that they were hoped for. Even using this medium to write tests, product owners were still reluctant to produce testing artifacts and found the output of FitNesse confusing and of little value to them.
So the community went back to the drawing board and valiantly tried to another approach. Enter Behaviour Driven Development (BDD). The BDD approach was to create a framework where the tests are written in plain english and saved as plain text documents. The thinking was; if we make it simpler, product owners and customers will want to write tests. They hoped and thought that if the customers could do this then the clever devs and their BDD frameworks will do all the other magic to turn these plain text documents into automated acceptance tests.
I am yet to see a product owner get excited about this breakthrough and throw their arms into the air declaring “Hallelujah! Finally I am in control of the quality of this project” and thank the agile community for developing the tool that they have been waiting for all their life. In short, I feel agilists are guilty of not understanding the product owners needs and are telling them how they should be doing their job. This is akin to software companies with the view - “If only the customers would get just how awesome this product is then they would use it the way that we designed it - the way they are supposed to use it. The users need all these these awesome features we have created for them but they just don’t understand the concept. The user is the problem and we need to educate them.”

Product Owners aren’t coders, they’re business people

Product Owners have their role because they understand requirements, customers, business drivers, marketing, sales etc. They are often business analysts and spend much of their time talking with customers, stakeholders, sales, marketing, usability experts, visiting sites/customers and of course making themselves available to the team to answer questions, providing scope and resolving any uncertainty that may arise in a story during an iteration. So thinking of acceptance criteria in a certain format or in a programming language is more than often incompatible with their skills set and interests. This is why they are reluctant to write acceptance tests in my experience. They seem happy to create acceptance criteria when asked but are more often than not, reluctant to do anything more than that.

Acceptance tests and more especially BDD tests are hard to refactor

Acceptance tests tend to just grow in a corner of the codebase are never refactored. The acceptance test regression suite gets added to each iteration and no thought goes into the question of duplication and quality. E.g. Do we need to refactor these tests? Are they all still relevant?
Test Code is a first class citizen and needs refactoring. Acceptance tests often don’t lend themselves to being refactored and I have yet to see a way to refactor plain text BDD tests.

Acceptance and BDD test suites are slow and this only gets worse over time

An acceptance test suite (sometimes called a regression suite) keeps getting bigger and bigger and slower and slower. One day you work on a story and find that you break a hundred or hundreds of these tests with one small change and have to go back and edit each of these failing tests. Or what more often happens - the entire suite is thrown out and the ATDD/BDD exercise is declared a failure.
When the running of an acceptance test regression suite (accumulated acceptance tests) starts taking an extended period of time then you lose the benefit of short feedback cycles. The likelihood of a dev team taking action on a failed build reduces proportionately to the length of time the build takes. i.e. the longer the build, the less likely it is going to be maintained and acted upon when it fails. Without refactoring, any test suite will suffer from accretion and entropy, and be eventually abandoned - particularly if the suite starts to take longer and longer amounts of time to run.

Acceptance and BDD test suites are fragile and time costly to maintain
Acceptance tests are know for suffering from false positive test failures. Development teams are unlikely to take any action when the build fails when they are tired of expending vast amounts of time and energy trying to fix fragility. That is - ascertaining if the fail was a bug in the code, the test, a change in the environment or just the suite framework being flaky. Fixing bugs that are intermittent and hard to reproduce are notoriously difficult and time consuming. When this happens repeatedly the solution is often to comment out the test, or remove the test, or abandon the entire suite if it happens too often.

ATDD focuses on the wrong part of the testing pyramid

Don’t get me wrong, I like integration, system and end-to-end tests. I’m in favour of testing the entire pyramid (see the pyramid below).  I’m not suggesting throwing the baby out with the bathwater when I criticize ATDD and BDD. To get functional tests to run quickly, not be fragile and appropriate to the product is a skill of an advanced development team. Too often new teams leap on ATDD as a best practice and the testing pyramid is turned upside down as their primary focus moves to the top of the pyramid where ATDD lives. In the pyramid diagram you will notice unit testing and Test Driven Development (TDD) should be the primary focus of the team and the foundation block of all testing. When a team is proficient at TDD with unit tests then they are more likely to create the appropriate level of testing on the higher levels of the pyramid. ATDD should not be undertaken by beginner (or SHU level) development teams in my opinion.

ATDD is too often mistaken as TDD (Test Driven Development)

“We do TDD here” is a comment I hear too often from teams that are not doing any unit testing but have focused their testing efforts on functional testing only and believe this is what TDD is. (As an aside even if you are writing unit tests, that doesn’t mean you are doing TDD. Unless you are writing the unit tests at the same time as the code, then you are not doing TDD.)

ATDD can encourage big design up front instead of emergent design

When everything else in agile is following the philosophies of just in time (JIT) and simple design, many of the assertions that you came up with in an acceptance test before you implement the code turn out to be based on assumptions in functionality that turn out to be wrong once the code and design have emerged. This then requires you to have to re-write the acceptance tests during the iteration. Loose or vague requirements (as favoured by agile) means that the way you implement a story may be very different to the way that you thought it would have been implemented before you started the story. The ability to evolve the solution is embedded into the nature of agile processes and is indeed one of it’s strengths. Because of this I’m going to claim that many acceptance tests go against agile architecture and emergent design.
This mostly happens when your acceptance tests are driving a GUI. Now I am not opposed to designing a GUI up front, and indeed this is helpful for the developers to have a design or mock to work to when implementing the story. But what often happens is, that it is only when you start implementing a design (coding) that you find out all the flaws in the design, and all the edge cases that the designer did not think of when creating the design. This results in the design and/or functionality changing, or emerging (part of emergent design) during an iteration and why I feel it would have been a waste of time and effort to have written tests against the GUI before the GUI was implemented.

ATDD is time consuming and not lean

The practice of ATDD can consume time in these areas : -
  1. Creating fixtures to drive BDD tests
  2. Re-writing the tests after emergent design has changed the original design
  3. Maintaining a fragile suite that suffers from false positives
  4. The test themselves are slow to run
This is not an insignificant amount of time and definitely not lean in nature.

BDD context switching

I prefer acceptance testing tools that sit on top of the same technology that you are using for unit testing. E.g. JWebUnit runs on JUnit. You can edit and run the tests all in the same IDE.
BDD testing involves switching between IDEs and/or language technologies that feels  unnecessarily disjointed to me.

When do you call the horse dead?

The industry has had long enough to try the concept of acceptance testing out on the community of product owners and the uptake has proven to been poor. I would surmise as a general a failure.
So how much longer are we going to flog this dead horse? Compare this to the uptake and acceptance of TDD as a practice. The TDD proof is in the pudding. TDD has stuck and works. ATDD did not and is a dead horse that the agile community at large seem to keep wanting to revive by changing it’s shape and creating new frameworks and approaches instead of ditching.

Testing Practices I do like

Writing good, clean and readable tests

Your tests are a first class citizen and form part of the documentation (or specification) of your application. Learning to write good tests, wherever they are in the pyramid, is an art in itself and one that must be learnt by every agile developer and practiced by every agile team. Bob Martin has a great chapter in Clean Code: A Handbook of Agile Software Craftsmanship on Unit Testing. The pattern of keeping test code clean applies to all types of automated tests.

Refactoring tests

All code suffers from rot and needs to be kept fresh, including test code. Learn to refactor your tests. I highly recommend this book on this topic :- xUnit Test Patterns: Refactoring Test Code by Gerard Meszaros.

Use the right tool for the right job. What Specification frameworks are good for

If your business has areas of functionality or business logic that have to meet compliance or legal requirements and be documented, then this can be a good use of a BDD or other specification testing tool. You can meet the compliance requirement around documentation and build a valuable test artifact at the same time. Win/win!

BDD test protocol

Given, when, then - is a nice way to think about your tests. It has become very popular among unit test frameworks also. I like it. Similar to the Arrange, Act, Assert pattern but it builds the pattern into the test protocol and thus improves the readability of tests.

Following the testing pyramid - Agile Testing

High performance teams I have worked with know how to use end-to-end and integration test frameworks effectively to ensure they are writing quality code and are not breaking functionality as they go. They know when to write these tests and what to test. They know how to refactor these tests to keep them fresh, relevant and avoid fragility. They know how to avoid duplication in these tests and how to, and when to, implement configuration management. They take ownership of writing and maintaining these tests (as they do for the quality of the entire product in general). They know how and when to break these tests into suites to optimize build performance. For example splitting out a smoke test suite that runs on every local build and a more complete regression suite that runs on a CI server after each check in. They know how to creates suites of tests and how to schedule and chain builds e.g. every night we shall run stress/performance testing and/or benchmarking. And not to be forgotten, the base of the testing pyramid is solid! That is to say, they have embraced TDD and have a high level of unit test code coverage.
This is what I call “agile testing”. I have asked non-coding agile consultant colleagues to use this term because it refers to the entire pyramid when talking about best practices for agile teams and not fall into using the (somewhat rhetorical) term of ATDD and TDD. The rule is - if you can’t write them, then don’t talk about them and instead use the term “Agile Testing”.

References and other supporting articles


Flipping the Automated Testing Triangle: the Upshot - Patrick Welsh

TDD: Where Did It All Go Wrong? - Ian Cooper

The Problems With Acceptance Testing - James Shore

A Case Against Cucumber - Kevin Liddle

AT FAIL - Uncle Bob Martin

http://www.jimmycuadra.com/posts/please-don-t-use-cucumber

https://www.thoughtworks.com/p2magazine/issue12/bdd-dont/

No comments:

Post a Comment