The scrapbook of Boris Marinov

Structured programming: how to write proper if statements

The if statement (or if expression) is the cornerstone of every modern programming language - it is so pervasive that we rarely think about how exactly should we use it, or how it is meant to be used. But despite its popularity, if wasn’t always there, and it wasn’t as pervasive as nowadays, so its role is, I’d argue, still somewhat misunderstood. So in this article, I will examine some mistakes that we can easily avoid in order to improve on our code.

Before we begin: My perspective on programming

This is an opinionated text, i.e. it contains my opinions and although I have many arguments that back them, they are ultimately based on my core beliefs about what programming is and should be, so take them as advice, as opposed to a dogma. And here are some of the things on which this advice is based on, structured as postulates.

Code is for humans (and programmers are humans too).

A form of this appears in a little book known in the hacker community as “SICP”, but I don’t know if they were the first to state it. It means that programs are not just instructions for performing a given task - they are text, and as such they should be subject to the same stylistical rules as other text is.

Code should be as bug-free as possible (and easy to debug as well).

This one looks pretty self-explanatory, but the key point is that we have to dedicate effort to make it such, by constantly thinking not only about ways in which the code might go wrong, but also about ways to diagnose potential problems.

I guess I should say something about why I believe these things. Simple - because all other approaches just fail because of the human factor, e.g. any code, that is written without regard for style is bound to become legacy when the person who supports it leaves. Likewise, maintaining code that lacks structure results in bugs (which often can be fixed only by imposing more structure).

Lastly, people often contrast good structure and readability with speed. Firstly, the comparison is a bit unfair, because a well-structured is easier to optimize. And for cases in which it is really the case, in 99% the gain is so little that nobody cares, so the emphasis should remain on readability and structure even then. This is when I would throw this very famous quote by DEK:

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97 % of the time: pre mature optimization is the root of all evil. source

Keep in mind that he wrote this when computers were a thousand times slower than today, so we should care about maintainability even more.

Short history of the if statement

Once upon a time, all there was was the assembly language where people didn’t have brackets and had to use jump instructions instead. For those who don’t know what jump does, if you imagine that somewhere in the virtual machine, or the interpreter, that runs your code there is a variable that denotes which line of code should be executed next, then jump is a functionality that basically allows you to set that variable to whatever value you please - it is like a portal that allows you to go to from any place of the program to any other place. For example, a typical if statement that looks like this when written with jumps:

compare input to waterlevel
if less, jump to A
if equal, jump to B
jump to C

A:
turn motor on
jump to C

B:
turn motor off
C:
...

Source for the assembly pseudocode

You can see from this example that, although powerful, jumps are much harder to work with than if-s, although they can sometimes make your code a bit shorter, they will 100% make it less readable which is hardly a good deal - if the important thing is for code to bug free, so you’d always prefer to have a longer and more readable piece of code, than one which is shorter but harder to understand. This realization led to the rise of what was at the time called “structural programming” paradigm, which was based on the idea that the power of jumps should not be exposed to the programmer - a point which was brought by Edsger W. Dijkstra in the seminal essay Go To Statement Considered Harmful (goto is the equivalent of a jump instruction statements in non-assembly language context). Simultaneously, the idea of a block (or a “compound statement”) meaning a collection statements that perform a common task (usually denoted today by wrapping them in curly brackets) was pushed forward, and the two ideas gave birth to the if statement that we know today (that first appeared in the Algol 60 programming language):

if (input less than waterlevel) {
  turn motor on
} else if (input equal to waterlevel) {
turn motor off
} else {
...

The adoption of structured programming was a long process which began with people even questioning whether structured programs are even capable of encoding all possible computations, but today, although some people are still defending the use of goto, it (the process) is largely finished - goto is banished and the structured style of programming is so pervasive across modern languages that the term “structured” is not used any more, simply because there is nothing to contrast it with.

And to reiterate, the reasons for this are quite obvious - using brackets makes our code much more akin to the way we usually talk and write with each other: we often say

“if A then B otherwise C”

we never say things like,

“if A then go to the previous page and start reading from line 12”.

(I wonder if the programmers who defend the use of goto write and talk like that?)

This already leads me to the first rule that I always follow.

Rule 1 - DO NOT use return for control-flow

A leftover from the goto-driven style of programming is the possibility of so-called “early exit” - the ability to quit the execution of a given function or method from any place of it’s body (most-oftenly by using the keyword return) as opposed to executing the function to the end. So again if we consider our plain old if else statement:

if (input less than waterleve) {
    turn motor on
} else {
    turn motor off
}
...

Using return we can write it like this

if (input less than waterleve) {
    return turn motor on
} 
turn motor off
...

When used like this, return is directly translatable to a goto.

if (input less than waterleve) {
    turn motor on
    Jump to C
}
turn motor off
C:
...

Why not write like this? We already established that goto is bad and I’d argue that mixing the two approaches is even worse in some respects, as it might fool you that the code is structured when it really isn’t. Furthermore, the early exit makes our code as confusing as any other goto-containing code when translated to natural language. Imagine someone saying:

“Bring me a coffee if you get home early.”

and then imagine them saying.

“If you don’t get home early ignore everything I’d say after this sentence. Bring me a coffee.”

So if your language supports it, always rely on the implicit return instead of explicitly ending your execution. If not, the simple rule you can follow is that you should either return from all branches of the if statement or from none of them.

Combining if’s with functions

The most common type of situation that I have seen the premature return pattern is for handling errors. Imagine for example a very long function with some validation at the beginning. Many people would write:

signup (user) {

  if (username is valid ) {
      throw invalid username
  } 

  create account for user
  sign user up
  initialise some other stuff
  write action in log
  send emails
  ...
}

This is a very subtle variant of the issue I am describing (made even more subtle by the fact that throw is used instead of a return). The correct way would be:

signup (user) {

  if (username is valid ) {
      throw invalid username
  } else {
      create account for user
      sign user up
      initialise some other stuff
      write action in log
      send emails
    ...
  }
}

but people would prefer having the main function code directly in the body in order to avoid indentation (especially if there is a lot of it already). According to them (and they have a point) the error handling is somewhat outside of the domain of the function, we can perhaps say that they are at different levels of abstraction. In cases like this, we should use functions to structure our code and to keep the different levels of abstraction separate.

Functions are everything that random sequences of statement are not and can never be - they have names, they have a separate scope (blocks also have their own scope, but they also inherit the scope of the parent block), and most importantly they are easy to debug (e.g. you have the name of the thing that went wrong right there on the top of the stack trace. So let’s look at how the code above can be made to work with functions. I think this is a very cool and underused pattern that allows you to lose the indentation without sacrificing your structure.


signup (user) {
  if (username is valid ) {
      throw invalid username
  } else {
      return signup internal (user)
  }
}

signup internal (user) { // or _signup as some people name it
  create account for user
  sign user up
  initialise some other stuff
  write action in log
  send emails
...
}

Rule 2 - DO NOT conflate multiple logical conditions in a single statement

This is another bad pattern that, we might say, has the same root cause as the first one - in their effort to keep things simple and concise, people don’t include all the structure that needs to be there in a robust piece of code.

In if statements we often have to deal with multiple conditions that are dependent on one another. And sometimes in such cases, people try to deal with all of them in one statement with multiple branches and else if-s. I argue that it’s actually better to have one statement per condition and to nest statements to get the control flow we want.

An example is worth a thousand words here:

  if (a and not b) {
      do this
  } else if (a) {
      do that
  } 

  if (b) {
      do the other thing
  }

This is the typical mess that we end up when we just add stuff to our code without refactoring it, and although it does not look that bad, it is nearly impossible to understand. I feel sorry for the person would have to debug this thing in order to fix the issues that it would no doubt cause, and I am firmly convinced that the only smart thing that they can do is to separate the code that deals with a and the code that deals with b into two separate statements like this:

  if (a) {
      if (b) {
          do that
          do the other thing
      } else {
          do this
      }
  } else {
      if (b) {
          do the other thing
      } else {
          // ???
      }
  } 
  

You can see that this block now starts to look like a piece of logic, as opposed to a random pile of conditionals. We can now reason about it, for example we can point out that not all combinations of values of a and b are handled, but more on that later. In short, our code does not work like a Rube Goldberg machine any more.

Some people would argue that having to repeat the do the other thing block in two branches is a deficiency but I beg to disagree - the two branches represent two different cases and the fact that we handle those two cases in the same way is hardly a reason for us not to enumerate them explicitly - we lose a little bit of conciseness but we gain a great deal in terms of readability and robustness.

But what if we have to repeat more than one line? As I said above, wrap that into a function.

Rule 3 - DO NOT leave unhandled cases

We all know that if statements should handle all possible cases and doing so is very easy, especially if we keep one condition per statement, we just have to ensure that every if statement has an else clause which would handle the cases where the other condition or conditions are not satisfied. Although simple, this rule is sometimes broken. What do we do, for example, when there is nothing that we should do in a given case? Imagine, for example, that we have a “Sing Up” button that can be disabled and that does not work if it is disabled. The specification might say something like “if the button is disabled, do nothing.” Turning this requirement into code in a straightforward manner, would result into this:

on signup click() {
  if (signup enabled ) {
      signupuser
  } else {
  }

Our implementation will be correct, but I’d argue that the specification is wrong - as a user, I hate it when I click stuff and don’t get any feedback (and you would start hating you too, if your users start contacting you with annoying questions). You have nothing to lose and everything to win if you just throw an alert there.

on signup click() {
  if (signup enabled ) {
      signupuser
  } else {
    show popup "signup is disabled because ${reason}'
  }

Aside from making your program better, this line might be useful for debugging - imagine that some user clicks the Signup button and nothing happens. This would mean that there is a bug somewhere in the code path. But if the path contains more than one if statements with no else-s then there would be no way for you to know which of them went wrong. Having some kind of marker of what happened, no matter if it is a message that a user sees, if it’s front-end code, or an exception that you can look up in your logs, if it’s in the back-end would ensure makes all such issues trivial to fix.

Another case when people make this mistake is the situation where you have a bunch of cases that you handle in different ways and you don’t really expect for something else to happen (this pattern is even more pervasive if you break the rule about conflating multiple conditions in one statements).

  if (a === 1) {
      do this
  } else if (a === 2) {
      do that
  } else {
      // What do do here?
  }

My advice here is to do something - return a value, throw an exception, any exception, leave a breadcrumb that when the time comes would help you what went wrong.

Rule 4 - DO NOT encode data in if statements

No matter how punctual are we with our if statements, the logic of our programs ultimately depends on the business requirements, which often seem ugly when implemented directly. In these cases, you might help yourself by encoding some of those requirements in datastructures, or even a DSL, and to write your code around those datastructures. Suppose you have the following requirement.

When the user signs up, charge them for membership. Membership costs 10 dollars for people below 18 years old and 20 for people who are older. Oh, and also pro members pay 2 dollars extra, unless they are subscribed for more than 5 years, in this case they pay 1.

Again, going straight from requirement to code would result in a somewhat messy block (note that there are two subsequent if-s - something which I see as an anti-pattern).

  if (less than 18) {
      charge 10
  } else {
      charge 20 
  }

  if (pro) {
      if (subscribed for 5) {
          charge 1
      } else {
          charge 2
      }
  }

A way to avoid this is to encode this logic as data. Let’s go back to the requirements and create a table with all the conditions. The table would look something like this:

age min age max subscribed for min subscribed for max pro price
0 18       10
19 999       20
0 999 5 999 true 2
0 999 0 5 true 1

After that, we can easily use this table to get the price for a given user, for example with the following functional piece:

rellevant charges = all charges
  filter(age min < user.age < age max )
  filter(subscribed for min < user.subscribed for  < subscribed for max)
  filter(pro membershup and plan.isPro

total = rellevant charges reduce (charge object 1.price + charge.object 2.price)

Not only did we manage to make our code very beautiful, we already have a mechanism to edit the rules without code changes.

Data-driven error handling

Let’s see one more example of how we can utilize data structures for control-flow. This one has to do with error handling. We often have multiple ways in which things can go wrong and we are encoding them with ifs, like this.

if (error1) {
   return Error: error1
} else if (error2) {
   return Error: error2
} else {
   doStuff()
}

Although this approach does not contradict the non-conflation rule (which is only valid for conditions that are dependent on one another) it has certain disadvantages in terms of readability, not to mention the inability to return a list of all errors, which is sometimes preferred. If we view error checks as functions that take an input and return a result indicating whether the input is valid under a given criteria, then we can encode them as a table, similar to the one in the previous example, except that here rows will feature custom logic.

error checks = [
  name: Below 18, check: (user) => user.age < 18
  name: is not pro , check: (user) => !user.isPro
]

As I mentioned, this makes it easy to gather all the errors and return them as an array:

errors = error checks
    filter(error.check(user))
    map(error.name)
if (errors > 0) {
    throw errors
} else {
    do stuff
}

And that’s it, for now. I would like to thank the lobste.rs community for suggesting improvements and for pointing out some mistakes in the original code examples. There are some interesting comments there.

Written on January 13, 2021