Twenty Three Hundred

Functions

Dr Charles Martin

Semester 1, 2022

Week 5: Functions

Outline

why functions?
calling conventions
the stack

because copy-pasting sucks

Function gallery

def plus_1(x):
  return x + 1

public String plusOne(int x) {
  return x + 1;
}

(define plus-1
  (lambda (x)
    (+ x 1)))

first, some analogies

Good: pipe (input & output)

or “black box”

Better: there, and back again

$f(a, b) = \int_a^b g(x) \mathrm{d}x$

A function call

program control flow during a function call

talk

Can we do this with branch (b)?

Open questions

how does the program know where to come back to?
how do we pass information (i.e., parameters) in?
how do we get information (i.e., return values) back?
can we have some “scribble paper”?

note: parameters/arguments - different words for the same thing

Remember Hansel and Gretel?

They try and leave a trail of breadcrumbs behind them so they can find their way back.

`bl` and `bx`

`bl`: branch with link

When the branch with link instruction (bl) is executed, the address of the next instruction (i.e., the one after the bl instruction) is placed in a specific register

`lr`: the “link register”

Just like r15 (pc), r14 also has a special meaning—it’s the link register

`bx`: branch and exchange

The lr might contain the address of the instruction we want to go back to, but how do we actually return there?

The branch and exchange (bx) instruction branches not to a static label, but to an address in a register

Don't worry too much about the "exchange" part

The “exchange” part means that bx can switch the CPU between “ARM” and “Thumb” execution modes.

We only ever use Thumb mode.

The way this work is tricky. bx rN says “branch to the address located in rN”.

Code address are aligned to half-words, so the lowest bit of the memory address is always zero. This lowest bit is used by bx to change execution mode.

Putting it all together

bl sets the link register

What about conditional branches?

Both of these new branch instructions (bl) and (bx) can’t be used conditionally (e.g. with an eq suffix) in the ARMv7-M ISA your microbit uses

You can get around this with IT blocks if you want, or you can use regular conditional branch (e.g., bgt)

  cmp r0, #8
  IT eq
  bleq add_one

Function template

@ use the type directive to tell the assembler
@ that fn_name is a function (optional)
  .type fn_name, %function

fn_name: @ just a normal label
  @
  @ the body of the function
  @
  bx lr  @ to go back

Functions are simple

use a bl <label> to branch with link

use a bx lr instruction to come back

Analogy Time: RPG quests

Nested Functions

Nested functions

nested function execution flow

did the breadcrumbs thing work for Hansel & Gretel?

Nested `Plus_1` (broken!)

nested Plus_1

talk

How can we stop the “first” return address getting clobbered?

Sure, store it to memory, but where?

Nested `Plus_1` (fixed!)

nested Plus_1 stores the link register to memory

this will work in this case, but there’s still a slight problem with the use of sp here—can you spot it?

The stack (sneak peek)

One final new register: the stack pointer (sp, but it’s actually r13)

By convention: the value of the sp is an address in the SRAM region of the address space (like with the .data section)

basically, it’s memory you can use to get things done

We’ll return to the stack later…

Calling conventions

Open questions

~~how does the program know where to come back to?~~
how do we pass information (i.e., parameters) in?
how do we get information (i.e., return values) back?
can we have some “scribble paper”?

assume x is in r0…

We need a convention

an agreed-upon plan for where to find the input(s) and where to leave the result

Calling convention definition

This is called a calling convention (CC)

It’s a contract between the caller (the code which makes the function call with bl <label>) and the callee (the code between <label> and the bx lr instruction)

What does the CC specify?

where to look for the parameter values (the inputs)
where to leave the outputs
which registers to touch, which to leave alone

talk

Which calling convention does this function use?

int do_all_the_things(int how_many_things){
  // lies! does *none* of the things
  return 0;
}

trick question!

There are many possible CCs

It doesn’t matter which calling convention you use (as we’ll see), as long as the caller and the callee use the same convention

CC example

Do these two two Plus_1 functions both give the right answer (i.e., x+1)? What’s the difference?

Plus_1:
  add r0, r0, 1
  bx lr

Plus_1:
  add r5, r2, 1
  bx lr

AAPCS

The ARMv7 Architecture Procedure Call Standard is the convention we’ll (try to) adhere to in programming our microbits.

The full standard is quite detailed, but the general summary is:

r0-r3 are the parameter and scratch registers
r0-r1 are also the result registers
r4-r11 are callee-save registers
r12-r15 are special registers (ip, sp, lr, pc)

What are scratch registers?

r0-r3 are “scratch” registers, which means that the caller can freely use them (and not worry about messing anything up)

These are also called “caller-save” registers, because if the caller wants to preserve the values in them they need to save them somewhere

Parameters and Return Values

Different ways to get data in/out

Do these two two Plus_1 functions both give the right answer (i.e., x+1)? What’s the difference?

@ pass by value
Plus_1:
  add r0, 1
bx lr

@ pass by reference
Plus_1:
  ldr r3, [sp]
  add r3, 1
  str r3, [sp]
bx lr

Pass-by-value vs pass-by-reference

Two different approaches to passing parameters and return values in and out of a function.

pass by value makes a “copy” (can mess with it without affecting the caller)
pass by reference gives the callee access to the same bits as the caller

pros and cons to both, depends on the nature of the things being passed in and out

in general, data needs to live in memory (registers are not for long-term storage)

The stack

Open questions

~~how does the program know where to come back to?~~
~~how do we pass information (i.e., parameters) in?~~
~~how do we get information (i.e., return values) back?~~
can we have some “scribble paper”?

What about local variables?

function doStuff(a, b){
  let c = a+b;
  let d = a-b;
  let e = a*b;

  // function body here

}

maybe put c, d and e in more registers?

What about local variables?

function doArrayStuff(a, b){

  let person = {
                 name: "Esmerelda",
                 age: 54,
                 pets: ["rex", "daisy"]
               };
  let junk = new Array(1000);

  // function body here

}

there aren’t enough registers this time

The stack pointer (revisited)

The stack pointer (sp) contains a memory address, and this can be used by functions for various purposes:

“saving” values in registers which would otherwise be overwritten (e.g. lr)
passing parameters/returning values
temporary variables, e.g. “scribble paper”

It’s called the stack because (in general) it’s used like a first-in-last-out (FILO) stack “data structure” with two main operations: push a value on to the stack, and pop a value off the stack

but only if you follow the rules

Setting up the stack

Look at the first instruction executed in the startup file:

ldr   sp, =_estack

Loads a value (_estack) into sp using the ldr pseudo-instruction

The exact value of _estack comes from the linker file (line 34):

/* Highest address of the user mode stack */
_estack = 0x20018000;    /* end of RAM */

Stack pointer in memory

Stack pointer memory

More about the stack pointer

the value (remember, it’s a memory address) in sp changes as your program runs
sp can either point to the last “used” address used (full stack) or the first “unused” one (empty stack)
you (usually) don’t care about the absolute sp address, because you use it primarily for offset (or relative) addressing
stack can “grow” up (ascending stack) or down (descending stack)
in ARM Cortex-M (e.g., your microbit) the convention is to use a full descending stack starting at the highest address in the address space which points to actual RAM

Stack Instructions

Using the stack

Just use sp like any other register containing a memory address:

mov r2, 0xfe

@ push the value in r2 onto the stack
str r2, [sp, -4]
sub sp, sp, 4

@ do some stuff here

@ pop the value from the "top" of the stack into r3
ldr r3, [sp]
add sp, sp, 4

Push, illustrated

stack push example

Pop, illustrated

stack pop example

the “missing” values in the diagrams aren’t empty, just unknown

Offset load and store with write-back

ldr/str with offset can write the new address (base + offset) back to the address register (in this case r1) in two different ways

pre-offset: update the index register before doing the store (or load)
```
@ r1 := r1 + 4
str r0, [r1, 4]! @ note the "!"
```
post-offset: update the index register after doing the load (or store)
```
@ r1 := r1 - 8
ldr r0, [r1], -8 @ no "!" for post-offset
```

Pre-offset addressing

load/store pre-offset

Post-offset addressing

load/store post-offset

Stack pointer example (again)

Pre/post offset addressing means fewer instructions

mov r2, 0xbc

@ push
str r2, [sp, -4]!

@ do stuff...

@pop
ldr r3, [sp], 4

`push` and `pop` instructions

Doing this with the stack pointer (sp) as the base address is so common that the ISA even has specific push and pop instructions

mov r2, 0xfe

@ gives same result as `str r2, [sp, -4]!`
push {r2}

@ do stuff...

@ gives same result as `ldr r3, [sp], 4`
pop {r3}

note that the sp base address is implicit

Register list syntax

There was one other difference in the push and pop syntax: the brace ({ }) syntax around the register name

Certain instructions take register lists—they can apply to multiple registers at once, e.g.

@ push r0, r1, r2, r9 to stack, decrement sp by 4*4=16
push {r0-r2,r9}

@ pop 4 words from the stack into r0, r1, r2, r9
pop {r0-r2,r9}

`push` instruction encoding

from A7.7.99 of the reference manual

Push instruction encoding

Load/store multiple

There are also instructions for loading/storing multiple words using any register as the base register

ldmdb load multiple, decrement before
ldmia load multiple, increment after
stmdb store multiple, decrement before
stmia store multiple, increment after

But if sp is the base address, then push and pop are probably easier to read

be careful about the order!

Functions and Stack Frames

Function prologue & epilogue

The beginning (or prologue) of a function should:

store (to the stack) lr and any other values (e.g. parameters) in registers which will clobbered during the execution of the function (remember the AAPCS)
make room for any temporary variables by decreasing the stack pointer

Function prologue & epilogue

The end (or epilogue) of a function should:

re-increment the stack pointer to free up the room for temporary variables
restore all the stored values back to the registers (e.g. lr)
make sure the return value is left in the right place
restore the stack state (e.g. put the sp back where it was)

Share house kitchen

Function prologue & epilogue example

  .type my_func, %function

@ assume three parameters in r0-r2

my_func:
  @ prologue
  push {r0-r2} @ sp decreases by 12
  push {lr}    @ sp decreases by 4
  
  @ body: do stuff, leave "return value" in r3

  @ epilogue
  mov r0, r3 @ leave return value in the right place
  pop {lr} @ sp increases by 4
  add sp, sp, 12  @ balance out the initial "push"
  bx lr

Function stack frame

Stack frame diagram

Nested function calls

outer_fn:
  push {r0,lr}
  bl middle_fn
  pop {r0,lr}
  bx lr

middle_fn:
  push {r0,lr}
  bl inner_fn
  pop {r0,lr}
  bx lr

inner_fn:
  @ do inner function stuff
  bx lr

Nested stack frames

the sp “zippers” up and down as the program executes

There’s lots more to say…

there’s more you can put in your stack frame (e.g. frame pointer fp)
ARMv7/AAPCS is pretty register-heavy (other ISA/CCs use the stack more, e.g. for parameter passing and return addresses)
an optimizing compiler will almost certainly not generate the code you expect
recursion is an interesting case (wait till lab 7)

These are all conventions

It’s the programmer’s job to adhere to them: the operating systems programmer, the compiler programmer, the library programmer, the application programmer, …

For bare-metal assembly programming, you’re all of those

Functions

Week 5: Functions

Outline

Function gallery

Good: pipe (input & output)

Better: there, and back again

A function call

talk

Open questions

Remember Hansel and Gretel?

bl and bx

bl: branch with link

lr: the “link register”

bx: branch and exchange

Don't worry too much about the "exchange" part

Putting it all together

What about conditional branches?

Function template

Functions are simple

Analogy Time: RPG quests

Nested Functions

Nested functions

Nested Plus_1 (broken!)

talk

Nested Plus_1 (fixed!)

The stack (sneak peek)

Calling conventions

Open questions

We need a convention

Calling convention definition

What does the CC specify?

talk

There are many possible CCs

CC example

AAPCS

What are scratch registers?

Parameters and Return Values

Different ways to get data in/out

Pass-by-value vs pass-by-reference

The stack

Open questions

What about local variables?

What about local variables?

The stack pointer (revisited)

Setting up the stack

Stack pointer in memory

More about the stack pointer

Stack Instructions

Using the stack

Push, illustrated

Pop, illustrated

Offset load and store with write-back

Pre-offset addressing

Post-offset addressing

Stack pointer example (again)

push and pop instructions

Register list syntax

push instruction encoding

Load/store multiple

Further reading

Functions and Stack Frames

Function prologue & epilogue

Function prologue & epilogue

Share house kitchen

Function prologue & epilogue example

Function stack frame

Nested function calls

Nested stack frames

There’s lots more to say…

These are all conventions

Questions?

`bl` and `bx`

`bl`: branch with link

`lr`: the “link register”

`bx`: branch and exchange

Nested `Plus_1` (broken!)

Nested `Plus_1` (fixed!)

`push` and `pop` instructions

`push` instruction encoding