Jenkins, You Can Take the Evening Off
Recently I was asked to participate on a panel organized by Electric Cloud to discuss Continuous Integration (CI). So I spent a day doing some reading and researching to polish up on my knowledge around the topic. Most teams I have worked with feel that because they have a CI server (e.g. Jenkins), then they are "doing CI". But I feel that this is not necessarily true and is only part of what CI is. In fact Automated Continuous Integration software did not exist when the CI practice first came to light so it can't be what CI is. Which raises the question, what is it?
An understanding of the history of CI will be important to understand why I am going to recommend that new teams learn CI without a CI server initially.
Note to users of Git – when you read commit or check in anywhere in this article, think Push.
History of Continuous Integration
Extreme Programming (XP) adopted the practice of Continuous Integration as one of the twelve core practices and brought it to the mainstream agile community. While CI is not part of Scrum, many Scrum teams have chosen to adopt CI as a best practice.
Kent Beck describes the word extreme in Extreme Programming to mean that he took common sense principles and development practices to extreme levels. In this way, when talking about integration testing - “If integration testing is important, then we’ll integrate and test several times a day (continuous integration)” – Kent Beck.
The What of Continuous Integration?
Continuous integration is the process of merging all developer working copies with a shared mainline several times a day.
The “several times a day” point is important as this was the evolutionary step away from a nightly build (which was the most common practice at the time).
"Mainline" is another important word to note. I'll talk more about that below in the section "Trunk not Branch".
The Why of Continuous Integration
The goals of CI are :-
- early feedback (XP value)
- avoiding merge hell
- supporting practice to collective code ownership (XP practice)
- encouraging simple design (XP practice)
- enabling merciless refactoring among a team (XP practice)
- ensure working software (the only measure of progress - Agile Principle)
All of the XP practices are there to enable refactoring. Refactoring is the core practice needed to architect a system with changing requirements.
Note that there is still no mention of CI servers or Automated Continuous Integration software yet. The technology did not exist when XP was born. This is an important point that has been lost to the masses who feel that CI is about having a CI server. I’ll talk more about where the CI server fits in later.
Before CI it was common to run a nightly build. You would come in the morning to see if the build passed. That meant that your feedback loop might be as long as 24 hours! That is a big hit in a one or two week iteration (or Sprint for the scrum people). So it was important to integrate more often. We want to find out within a couple of hours if integration is going to work rather than a day. We want to find out within hours if there are architectural or design issues among the team, rather than a day or days. Days is not uncommon on teams that use Git and choose to work on separate feature or story branches and only merge them once the entire story is complete. This pattern can also cause "merge hell". Read on.
Avoiding Merge Hell
The more and longer you leave your code un-merged from the main branch, the harder and more time it can take to merge and resolve collisions. This becomes particularly apparent once a team adopts the XP practice of Merciless Refactoring. Within the scope a single task, the changes made to the codebase are not limited to the minimum amount to complete the task (a pattern seen in Scrum teams that do not adopt merciless refactoring), so can be numerous and might cover a broad swathe of application and test code.
Once the merges are numerous and there are many collisions, you are in “Merge Hell”. To avoid this, the pattern is to integrate your code every couple of hours. “No code sits un-integrated for more than a couple of hours. At the end of every development episode, the code is integrated with the latest release…” – Kent Beck Extreme Programming Explained: Embrace Change Chapter 16: Development Strategy
The mantra that I was taught when learning XP was to “Commit early, commit often”. This, I feel, is the first aspect of CI that a team needs to learn!
Collective Code Ownership and Simple Design
These benefits will become apparent in the Learning Continuous Integration section below under the headings with these same titles.
Once a team begins to refactor mercilessly, you will find yourself in merge hell more often than not unless you integrate with each others code frequently. If you have an agile code base (to support an agile project i.e. one where the requirements are changing) - you should be refactoring mercilessly. If you don't, you will be incurring technical debt and end up with a legacy system. I have written a blog on the topic of Iterating Toward Legacy which describes this in more depth.
The How of Continuous Integration
I recommend a team learn CI without a CI Server. (Which is what the title of this article is hinting at.) Once you have the core practices down, then, you are ready for a CI server. I like James Shore's description of How to Practice Continuous Integration. Probably the best I have read apart from the original white book by Kent Beck. Below are the core CI practices a team will need to learn :-
Note - We will assume that you have source control already set up and the team are proficient in it's use.
Create scripts to build your code and have them committed to source control. This will often involve learning to use a build tool such as maven, gradle, grunt etc. Technically, you can still do CI without a scripted build but this is rare and so I will not cover it. The build scripts need to include running a test suite and failing the build should any tests fail. For more information on build automation see James Shore’s website on automated build or his book The Art of Agile. You can see an example script here.
The Ten Minute Build
This practice is a refinement on the automated build. Because we must run a complete build before we commit (or Push if you are using Git), then we need to keep the build to under Ten Minutes. This can be challenging for some teams who already have longer build times. You are able to do CI with longer builds – but the pattern is to try to reduce your build time in order to shorten feedback time, reduce collisions within a team and avoid build races.
Maintaining a fast build is an ongoing discipline of an agile team that requires a team to track the build time. It will require a team to change and refactor the build scripts and test code with the same attention that we give to our production code. In a team that is practicing Collective Code Ownership, everyone in the team is responsible and capable of editing these scripts, the source and test code.
Commit Early, Commit Often
In the early computer science days we were taught the mantra of “Save early, Save often”. With the advent of source control this became “Commit Early, Commit Often”. If you have gone a day without a single commit, it’s an indication that you may have gone down the rabbit hole of over complexity and need to roll back your changes and start all over again.
It’s not a bad thing when this happens - and it does happen. When this happens, you've learned what didn’t work and in doing so probably worked out how to break the problem down into smaller chunks and which order to tackle them in when you come back to it.
Trunk Not Branch
This one is always contentious with the git folk. If you are using git, it is not that you shouldn't create branches, it is that you need to merge your code with trunk with the same cadence that we spoke about above. Martin Fowler has an excellent article on the challenge of CI with Git. His closing line is a quote from Paul Hammant :-
"I wonder though, if a team should not be adept with trunk-based development before they move to distributed"
The reason that we stay on trunk is to avoid merge hell. If you must use branches, then try to ensure they are as short lived as possible.
Side note - In a highly distributed team this may not possible. But then being highly distributed is already an agile anti-pattern.
Side note - If you are considering Git for your source control then consider the Centralized Workflow implementation.
The idea of the build token is that when you are about to kick off a build on your computer (or for a beginner team a dedicated integration computer), you take the build token. This is a physical item that all the team recognize. In most instances I have seen, it is a rubber chicken or other similar frivolous yet fun toy that typically can make a sound when squeezed.
Whilst others are able to build without the token, they are unable to commit their code (by team agreement).
This practice is to stop a build race. As the holder of the token, if your build passes, then you are able to commit your changes to trunk. At that point you initiate the token’s noise and return it to the ready location. The noise is an indication to the room that they need to merge changes into the code they are working on ASAP. This is where the continuous part comes in. Because now you are continuously merging in changes and keeping in sync with the whole team and continuously integrating changes to head/trunk. (Actually, continually would be a more correct word if one was going to be pedantic.)
When a team begins to really practice Merciless Refactoring, then in the scope of any task, you could find yourself in parts of the code that you did not imagine before you began coding. This is not to be discouraged. The knock on effect of this though, is that you are far more likely to be colliding on classes and methods with other team members. When these collisions happen, you want to find out sooner rather than later so the two pairs (assuming that you are practicing the XP practice of Pair Programming) can get together and have an architectural discussion to resolve. This is part of fast feedback and how emergent design works!
Emergent Design and Rapid Feedback
Consider that we were able to discover an architectural issue within hours (rather than a day) and can quickly resolve it thereby minimizing the disruption to the flow of coding and likelihood of having to roll back changes. Consider also how easy it is for a co-located team to resolve an architectural issue via a quick face to face huddle in front of a white board. This is one reason why XP recommends development team members being collocated. It increases communication and can eliminates time lost because of distance issues.
Were you only integrating daily instead of multiple times a day, how does it feel to come in in the morning ready to start a new task after a good days work yesterday, only to find that you need to go back and change the code from yesterday? Integrating 2-4 times a day enables a much finer grained level of Rapid Feedback. I believe the effort/cost of fixing issues increases exponentially over time. The sooner you find and fix it, the cheaper.
Another way to avoid collisions, and a pattern to refactor the code when you do have collisions, is to make classes and methods smaller. This is one of the rules of Kent Beck’s "Four Rules of Simple Design" and it's a good OO practice in general.
Simple Design is one of the XP core practices. I hope you are starting to see how all the XP practices work together and why you should take them all on as they are supportive of each other.
Side Note - If you are doing the XP practice of Test Driven Development (TDD), you will find it naturally encourages smaller classes and methods. See what I mean about the XP practices all working together?
Collective Code Ownership
Collective Code Ownership is another XP core practice. The heart of an agile project is a self-organized, self-managed team. (In XP that team is co-located and sit together in an open office environment. Scrum does not enforce this practice.) The idea of Collective Code Ownership is that anyone can edit any code. We are trying to break down silos of knowledge within a team. (Pair Programming is the quickest way to reach the point of Collective Code Ownership within a team.)
CI has been described as a prerequisite to Collective Code Ownership. By integrating often in the day, you will be keeping up to date with all the code changes that are happening by all of the team. You are staying in sync with all the team by continuously reading each others code changes every few hours when you integrate.
Step 2 - Automated Continuous Integration
OK Jenkins - You can come back now
Once the team has all this down pat, then it is ready for an Automated CI Build Server - such as Jenkins.
Again, this requires some learning and should be owned by the entire team now that we have established Collective Code Ownership. You do not need to throw away any of the above practices. Instead you now add to it. I highly recommend that a monitor and speakers be in the development environment so that notice of any build or integration failures found by the CI server are immediately obvious to the team.
The rule/practice for the team is this:-
The teams highest priority is to maintain a green (passing) build.
Should the build break, the team should stop everything, have a huddle to determine the cause of the break, identify who will fix and and make that the highest priority for the team. Swarming and/or mob programming is a good practice here.
While the build is red (broken), no one is permitted to commit any code (by team agreement).
Here is another brief description of an Automated CI process.
Continuous Deployment, Continuous Delivery(CD) and DevOps
These are whole topics in and of them self and also the latest buzzwords in agile. Core to these practices is the CI server. Some clever folks worked out these automated build machines can do more than just build and test. They can also become gateways, package, deploy, monitors, run broader environment and package testing. And so, Continuous Deployment, Continuous Delivery and DevOps were born. Enough said.
Closing and summary
I'm not going to go into any further detail on the uses or setup of a CI server such as build triggers, chained builds, breaking out test suites, stress testing, bench marking, testing environments, continuous deployment etc. They are each topics in themselves and take me away from the main topic of this article.
I'm hoping that you noticed the interrelationship between CI and all the other XP practices that have been mentioned in this article. The practices/disciplines are designed to work together in this way and work best when ALL of them are practiced. That is why I prefer to keep XP intact rather than breaking it into a smaller subset of the practices and calling them agile development practices or technical practices. Don't mess with Grandma's recipe. If you want CI, then do XP.
The main points I want to reiterate for my closing summary :-
The first rule of CI is - Check in Early, Check in Often
You should be be committing (and therefore building) your code 3-4 times a day. That is why they call it "Continuous"
CI is so much more than a CI server. Learn all the other practices before installing a CI server
Get adept with trunk-based development before moving to distributed
Do ALL of the XP (Extreme Programming) disciplines and not just part
Martin Fowler's blog article on Continuous Integration