Preemptive commit comments

tl;dr version

Rule #1: write commit comments before coding
Rule #2: write what the software should be supposed to do, not what you did

Long version

Dan North changed my life

You should read Dan North’s epic post Introducing BDD. It might be a little of an oversimplification, but the whole (revolutionary) Behavior Driven Development thing began with the simple attempt to replace the word “test” with “should“.
Words do matter.

After being introduced to BDD (and the Clean Code‘s chapter about methods’ names) I started taking more care of the words used in comments, in commits messages, in test and in method names. The more I tried to adopt the end-user’s perspective, and to focus on what the code  was doing, rather than on how. the more things were making sense.

My tests were sentences such as

when_this_occurs_the_user_should_experience_that()

or the like.
I started thinking that I could use the same approach for my commits’ comments as well.

Tell me what the software does (not what you did)

When I check out a commit, I’m interested in what I will find in it.

I’m aware that someone has worked on the code in order to let the software expose some features, but I’m not really that interested in what they did: instead, I’m more compelled to know  what the software does in that specific snapshot.

I want an answer to the question

What’s the project behavior in this snapshot?

rather than to

What did the programmers do, in their coding session?

Hence, comments like

Fixed Gitk and Git Gui [spdr870]

(I took a random commit comment from the GitExtensions project)

is of much less significance of something like

When history is over 10k commits, gitk doesn't crash trying to display it

Welcome back!

A colleague of mines came back after 1 week of holidays. We needed to update her with all the changes we made to the code base.

Put yourself in her shoes and try to regain contact with the project, reading this git log:

Now, do it again, supposing we wrote our commit comments like they were items in a “What’s News” post:

Which one conveys more valuable information?

Dan North, again

As an experiment, we began to write comments describing the behavior of the software, rather than the implementation, or a description of what we did.
And we found that it was lading to a more readable history of the project.

Commit comments started to look much more like BDD’s methods name, or short versions of a tiny requirement: a description of a behavior.

The principles my colleagues and I started to follow were:

  • Talk about the feature, not about yourself: don’t write what you have done; describe what the software does (thanks to what you did)
  • Don’t refer to the past: don’t describe what the state of the software was: tell what the state is right now. Use the present tense.
  • I know it’s now: the commit has a datetime: you don’t need to specify “now” or any other time reference.

Here comes Eric Willeke

Then, unexpectedly, I read this tweet by Eric Willeke

This is what my idea was missing! This is what BDD and TDD are actually prescribing: first assert the behaviour with a test, then code. If my commits comments are like BDD assertions, they should be written before coding. Eric must be right.

We tried.
Know what? It worked pretty well.

Write Pre-Emptive Comments

Just like you write tests before you implement a feature, you should write commit comments before you start coding.

Here’s a list of benefits we noticed:

    • More focus while developing: I love pairing. I like how the navigator can take a note on a to-do list whenever the pair finds a reason to digress; when this happens, the navigator just writes down a note and takes the focus back to the feature to be implemented. Navigator’s notes may eventually be the next technical tasks to be performed. No matter the distraction, the behaviour described in the test is the center of gravity.
      Just like a good BDD test, a preemptive commit comment can help the pair to focus on the specific task to be performed. A preemptive commit comment acts pretty much like a preemptive test. It’s just much easier to be written.With preemptive commits a pair is less prone to digressions.
    • Commit review is easier: I’m used to carefully review each diff before committing my changes. Once I started writing commit comments before coding, code review was much simpler. Commit review becomes a let’s-confirm-we-did-what-promised task rather than let’s-discover-what-we-did activity.
    • Less cognitive load: I am used to write a failing test before quitting the office. It helps me quickly switch to the work context the next day. As a matter of fact, one of TDD’s nice side effects it the less cognitive load on the team: after an interruption, if a failing test communicates the developers what’s the next step to perform, they are easier introduced back in the workflow.
      Preemptive commit comment does the same, with a greater granularity.Every time I come back from lunch, I just read my preemptive commit comment, and my brain quickly switches to what I was doing 1 hour before. When I don’t remember with enough details, I launch the test suite, looking for red tests.
    • More accurate and faithful comments: since you have to write them before coding, you want to be just enough specific; a commit comment becomes a declaration of intent, just like a BDD method name.Isn’t it ironic that a Git check-in is called commit? You are, in fact, making a commitment, an engagement. It is just natural that you anticipate what you’re going to do with a declaration of intent.Once you have written down your declaration of intent, you will just want to develop code adhering to what the comment claims, abstraining from other goals; you will not want to be too much specific or too much generic. Comments and commit content naturally end to better match.

      No more “Bug fixing“, “Just changed file Main.cs“, “Improvements” or “I made [foo], [bar], [baz] and also [quz]” comments.

    • Each preemptive comment triggers a micro design session: before writing a commit comment, you will start desiring to discuss the exact word to describe a specific behavior, or the exact name to give to a domain concept.; you will want to know which module has the responsibility to send the PDF file by email in your gorgeous ERP plugin, since you have to decide if it’s better to write “PDF Exporter sends a copy of the document by email” rather than “Email Notifier detects files exported by PDFExporter and sends them by email“. In order to write a commit comment, you must be able to describe the feature: this will lead to make you stop for a while and do a nice, light, old-fashioned brain-storming design session.
    • A preemptive comment sets a micro goal: the mere act of writing a commit comment helps you focus a goal to reach. Since the goal the pair is committing on has been defined from the beginning, it’s simpler for the pair to realize when the goal has been reached and the job is finished. A well defined commit comment embodies a little definition-of-done.
      Without preemptive comments, I often used to go on coding, always asking myself: “Should I commit now? Have I reached a stable state which I could consider a good commit?“.I found that I can define micro-goals through preemptive comments, and a macro-goal through the feature branch name (which is also pre-emptive by design).

 

    • A preemptive comment creates a little timebox: I found that with practice, I could divide a coding session into several micro-timeboxes. This is because I have a moment when I start working from a stable state and a clear goal to reach. I am encouraged and pushed to reach the goal.Can you believe it? Just because of a stupid comment. Human brain is weird…

 

    • Writing comments preemptively puts the agreement between the pair members to a test: driver and navigator must agree on the comment, that is, the goal to reach. I found that very often the mere fact we must write the goal before coding requires us to discuss and share our points of view. And it’s awesome. The moment you will start coding, you will be much in accord with your pair on what you will want to achieve.

 

  • The commit history gains a very balanced granularity: each commit has a precise goal. A feature branch becomes a collection of evolutionary commits each of which has usually a 1:1 binding with tests; each test has its commit. It’s also very easy to find which commit introduced a bug, since each commit is related to a single new goal/feature. You could say that each commit honors the Single Responsibility Principle.

The experiment was originally conducted by Arialdo Martini, Mattia Piccinetti, Gian Marco Gherardi, Guglielmo Brasile, Francesco Pichierri and Giuseppe Mariano.

We are now using this technique on a regular basis.

(Should you decide to give a try to this practice, please, write me a note with your outcomes: I would love to share your opinions and ideas)

Cheers

45 thoughts on “Preemptive commit comments

  1. Really Smart! Just wondering how to handle those scenarios when you have to change atomically more than one type to keep the code buildable. Indeed, I can’t think of a commit that intentionally break the build… In such case pre-emptive commit messages are still valuable but don’t map to single feature/behaviour/test.

  2. I’m incredibly appreciative to Arialdo for trying this out, and excited about his results!

    The unexpected killer aspect to this one for me was how it triggered the just in time design sessions. I’ve always valued the emergant architecture/design aspect of Agile quite a bit, and as a result my teams are always in the habit of sketching before coding, but I’ve not seen that behavior very consistently as a coach. For that to naturally emerge from a different practice is incredibly exciting!

  3. Wow. I can’t believe anybody thinks commit messages like that are a good idea. This is completely the wrong way to track this information. Your issue tracking system should contain the information about what the change does (in plain language) and be linked from the commit message and the commit message documents what the programmer did. The commit history should be a tool for the programmers and the issue tracking system should have more general visibility to stake holders. Taking just your first commit message as an example “Converted from SaveFileDialog to SmartFileSelector” vs “When the user saves the file, a Preview is shown, thanks to SmartFileSelector”, when you have a bug in SmartFileSelector and you review your git history, your first message shows I can ignore that commit however the second means I have to read the diffs to find out whether you changed the code of SmartFileSelector. Also by committing at the granularity of change does x you force a large commit as the entire change X must be completed in one commit whereas point to a issue can be carried through many smaller commits which are easier to review/revert/cherry pick/merge. All of your benefits can be gain by putting that effort into writing your issues and you can pair to write your tickets if that is beneficial to you. And by having your commit message summarise what you did that information is captured, under your system that information is lost.

    1. You may be right. Actually, we are exploring and experimenting this approach, and I’m sure it can be improved. And yes: it may be completely wrong.

      About the granularity: inside a feature branch, I try to act like this:

      * I write an end-to-end test describing the whole feature to be implemented. This test will be always red, end it will eventually get green at the end of the feature branch.
      * I commit: this is the goal to be reached. When the e2e test is green, some business value is added to the project. The commit comments says the project is lacking a feature.
      * I start with TDD with a long series of comment+test/code/refactor. Each commit comment describes a behavior. At this level, the comment is very very developer-centric. It’s still a about a behavior, but usually it’s about a class behavior, not about what the end-user sees. Hence, I feel the git history can still be used as an effective tool for developers.

      The post shows an example with end-user centric commits; I should have included a whole feature branch too, since you’re right: the example given has comments describing end-user features. Here are 2 (real) examples of features branches



      As you can see, these are much more low-level. They talk about classes’ behavior.

      I also be very worried if this technique would lead to what you describe as “X must be completed in one commit“.
      No, actually it may even increase the number of commits.

      About the issue tracking, what you says is very interesting. I also feel the need to link the git history with issue tracking. My feeling about this is that git history should be self-inclusive. Issue tracker is a communication tool; git history is the project. In other words, ideally, I wish I’m able to completely drop the issue tracker without loosing information.

      Thanks for your (negative) feedback: I feel they will be useful to review what we did.

      1. Interesting your thoughts that you feel that you need to be able to drop the issue tracker. I’m curious as to why. I feel much happier with your explanation that it increases your number of commits. I’d be interested in your thoughts on changes that cross multiple repositories? Hope to follow your changes to your approach (or why you choose not to).

      2. Well, I didn’t mean that I want to drop my issue tracker, or that I want to replace it with git.

        I just meant that I don’t want to force a developer to access the bug tracker in order to understand the project history.

        In other words, I try to avoid commit comments like

        Fixed the bug #652 (see BugZilla for more information)

        The history should be self explanatory.

        About the multiple repositories, it’s an interesting and not trivial question. Let me chew the cud…

    2. IMHO, in git, issues map to feature branches, not to commits. Thus if you named the branch after the issue, the issue-number in each commits is almost useless.
      Moreover, if you need to know what files a commit changed you should use “git show –name-only COMMITHASH” and if you need to know which commits has changed a files you should use “git log –follow Path/To/File”. Replicating such info in the commit message, is actually useless in most scenarios.

      1. Definitely, you are right. This is what I tried to say with “I can define micro-goals through pre-emptive comments, and a macro-goal through the feature branch name (which is also pre-emptive by design).“.

        Anyway, this requires the use of feature branching (which I love).

        Should you wish to avoid feature branching (for example Martin Fowler thinks Feature Branching is evil, see Why Does Martin Fowler Not Understand Feature Branches and Martin Fowler Has A Merge Paranoia), you could map issues to a commit comment, rather than a branch name.

        Personally, I disagree with Fowler: I don’t see any reason to avoid feature branching, and I completely agree with your comment above.

  4. Just a note: I’ve just fixed some documentation over the code and… there’s no better commit message than “fix doc”. This is quite interesting, since comments are first class citizens of the contracts that the code express (eg .NET doesn’t support checked exceptions, thus good documentations complete the contract).

    However if I fix doc by adding a new tag I should use a commit message like “MyDumbMethod throws MyException”. I still fixed the doc only, but in this case “fix doc” is not enough, since clients should check whether they already catched or not the exception.

  5. First of all I like the basic idea: like test-first I write a description what I’m going to do in my commit comment before I actually do the work. Here are my 2 cents:

    I see a commit as a change set opposed to a “snapshot” of the behaviour, which I think is better reflected in tests/specs in your codebase. And there are different kinds of changes to the system:
    – adding a feature: these are the cases you described above, where you added a new behaviour to the system
    – removing an obsolete feature: here you can only describe, what behaviour the system will not support anymore and the why (there should be a user story regarding the removal!)
    – refactoring (no changes in behaviour): in this case as no behaviour is changed the comment would be empty or only “refactoring”. But I think that is not enough: I would add also the rationale / the why behind this design work e.g. “refactored to remove duplication”, “refactoring: split up class A into class A and B to enforce Single Responsibility Principle”.

    1. On refactorings: the reader of the commit message is a fellow developer, not an end user. And to the fellow developer the new state of the code is really interesting. You could e g say “Method longAndNasty have sensible names for temporary variables”

    2. I see an analogy to when you change some other architectural aspect. E g if you have increased capacity, it is uninteresting to any single user, but really interesting to other stakeholders. Refactoring touches the architectural aspect “extensibility”.

    3. Absolutely inspiring. Thank you.

      I like your analysis: adding, removing and refactoring seem to complete the set of types of changes that can be applied to a system. May be we could also include “trying a spike” or “prototyping” (that is: commits containing code that, by design, will be discarded), but I’m not sure, it’s just an idea.

      I absolutely agree on your note about removing an obsolete feature.

      Talking about what to write in refactoring commit comments with Dan Bergh Johnsson, he wrote this inspiring tweet:

      I completely agree with him.
      In these days, while commenting on a feature branch, I started thinking I’m talking to a developer; when I have to merge the feature branch to dev, that is, when I’m adding a feature which is visible to the end user, I try to commit using a language meaningful to the end user.

      Hence, I’m not sure I agree with you. I’m not that sure that one should just write “refactoring” as you are suggesting.

      Also, about your examples “refactored to remove duplication” and “refactoring: split up class A into class A and B to enforce Single Responsibility Principle” I think that the use of the Past Participle is hiding the fact you are talking about an action you did (“[I] refactored to remove duplication“), not a state of the software. In other words, I think this rule of thumb could be valid:

      If comment explicitly or implicitly begins with “I”, it’s wrong. You should describe what the program does or how the code is, not what you did.

      Also, those comments are somehow referring to the past. May be I’m too strict on this, but I think that a commit is always a collection of diffs from another commit, hence it may be really useless to remember “this commit introduces a difference from the past“. I mean, it’s some kind of obvious information.

      Again, about time references: I aim to find an absolute description for the state of the software or the code. The future reader of the commit is interested in what she will find in that commit. Anything existing before that commit is past. Did you fix a bug? Well, that bug is a thing of the past. In few commits that bug will be forgotten. Or, at least: no one will have any interest in knowing that before your fixing commit there was a bug. What’s important is the current behavior.

      In the future, that commit could be cherry-picked by someone, and seen (and, somehow, correctly interpreted) as an “add-a-feature” commit, not as a “fix-a-bug” commit.

      The same, I think, applies to refactoring.

      For example, rather than your

      refactoring: split up class A into class A and B to enforce Single Responsibility Principle

      (really? How were A and B classes before? Before when?)

      I would write

      class A has responsibility X, class B has responsibility C

      1. I’ve got your point, but I see the model of the commit log still more as a history of changes than your history of snapshots of what the software does. A change has a “before” (the past, that you don’t wanna read) and an “after” (the state / behaviour you want to read). For adding features the past is not worth mentioning (the feature was not implemented). IMO for those commits your model works perfectly. But for the other kinds of changes (bugfixing, refactoring and removing of a feature) I think the “before” is equally or even more interesting for the reader of the log, because as a reader of the log I also want to know, WHY the change was necessary. In these cases the not intended state “before” is actually the reason for the change:
        – removing: not the functionality “before”
        – bugfixing: not the bug that existed “before”
        – refactoring: not the design flaw that existed “before”
        Combining the what and the why could sound like
        – bugfixing: “affected feature XY now works correctly without bug 1234”
        – refactoring: “a centralized constant MAX instead of dispersed literals across the class”
        Aren’t you missing the WHY? And if so, how would you write the bugfixing example then?

      2. The “why” argument is really a good point.

        It reminds me of the typical “In order to do X, As a Y, I want the software to do Z” sentence, where the “in order to do X” captures the reason why someone wishes the feature.

        In a random chat, Marco Pegolo suggested me to use exactly that sentence, at least in commits that will be merged to the release branch, that is, in commits that will be eventually read by business users. I think I agree with him, but I also think that your examples achieve the same effect.

        Especially in the case of bug fixing I think a reference to the previous (bugged) state could be useful. I can agree with you.
        Let me ruminate on this for a while 🙂

      3. I think there’s a lot of value in following the pattern for the refactoring/removal cases, because I pretty strongly believe it’s the current reality (as of the point of the commit) that’s valuable… including the “why” if it’s not obvious.

        Examples include “New subscriptions can no longer access the ‘program’ menu” (as an example of a removed feature, and would be attached to a story like “In order to reduce our overall interface complexity, as portfolio owner I no longer want users to utilize the legacy ‘program’ capability”)

        For refactoring activity, I might see:
        “CommentRepository has ownership ownership of serialization of comments (instead of PostRepository)”

        In the previous, I needed to elaborate ‘why’, while in the latter it should be generally obvious (especially if the next commit is “CommentRepository supports JSON output in addition to XML” or some such)

      4. @Arialdo
        Stating the “why” (BDD’s “In order to do X”) delivers two benefits depending on the point in time:
        – before performing the changes of a commit, it reminds the programmer to think about the value of the following efforts in advance
        – reading the comment after the commit tells the history of changes including the motivations for them back in time

        @Eric
        +1 for “why on demand” depending on the context

  6. How did you obtain the suggestions to compose ““Preemptive commit comments | Arialdo Martini” tirey.
    tv ? Thanks ,Lisette

Leave a reply to Arialdo Martini Cancel reply