Week 7: data structures
You’ve made it to the second half of the course! Hooray! Now that you’re here you should be prepared for a few more “gaps” in the lab where you should figure things out yourself or with your colleagues. The tutors are always here to help out with hints!
Outline
Before you attend this week’s lab, make sure you remember how to load and store data to/from memory using ldr
& str
with different
addressing modes
In this week’s lab you will:
-
use a simple data structure for storing dot-dash “morse code” codepoints
-
store multiple morse codepoints in an array-like structure in memory
-
write a program to read an ASCII string and “output” it as morse code blinks
Introduction
Morse code is a simple communication protocol which uses “dots” and “dashes” to represent the letters of the alphabet.
The dots and dashes can be represented in different ways—as dots or lines on a page, as short or long beeps coming out of a speaker, or hidden in a song on the radio to reach kidnap victims, or as short or long “blinks” of an LED on your microbit.
In this lab content the morse code will be represented visually using a sequence of .
(dot) and _
(dash) characters, but by the end of the lab you’ll be sending
morse code signals by blinking a red LED on your microbit in short (dot) and
long (dash) bursts. Here’s the full morse alphabet (courtesy of
Wikipedia).
Discuss with your neighbour—have you ever seen (or even used!) morse code before? Where/when?
Task 1: a LED utility library
Remember the setup code you wrote in the blinky lab? Now that you’ve got some more skills with functions under your belt, you can probably imagine how you might package up a lot of that load-twiddle-store stuff into functions to make the code a bit more readable (this was actually proposed as an extension exercise at the end of the last lab).
In fact, we have already written this library for you! You can find it in the lib/led.S
file after you fork & clone the
lab template from GitLab.
Keen eyed students may have noticed that we have other new additions to the lib
folder.
lib/util.S
: A library containing some functions for basic bit-ops and other similar functionality that we don’t want to have to write out each time we need them.lib/symbols.S
: A library containing a bunch of definitions of commonly used memory locations and offsets for the microbit.
lib/led.S
relies on both of these libraries, and you can use them too!
Have a read through the code in led.S
—you should now be at the stage where
you can look at assembly code like this and at least get a general sense of
what it does and how it works. Here are a couple of things to pay particular
attention to as you look over it.
-
The code uses
push
(to store the value in a register onto the stack, and decrement the stack pointersp
) andpop
(to load the top value on the stack into a register and increment the stack pointersp
). You can do this in other ways (e.g.stmdb pc!, {lr}
) butpush
andpop
are convenient when you want to want to usesp
to keep track of the stack. You can see thepush
/pop
instructions in Section A7.7 of your your ARMv7 reference manual -
Some (but not all) of the functions take arguments (described in the comments), so before you call these functions make sure you’ve got the right values in these registers to pass arguments to the functions.
- The
.global init_leds, ...
line is necessary because you’re putting the code above into a separate file to the one where the rest of your program will be (src/main.S
). By default, when you hit build/run the assembler will only look for labels in the current file, so if you try and branchbl
to one of these functions frommain.S
it’ll complain that the label doesn’t exist. By marking these functions as.global
, it means that the assembler will look everywhere for them, even if they’re in a different source file to the one they’re being called from. Finally, the.global
labels in a file are good clue about which functions are useful to call from your own code. Take a moment to have a look over what the variouslib
files offer for you. - You may notice things like
ldr r0, =ADR_P0
and wonder why it’s not a number after=
.ADR_P0
is a symbol declared inlib/symbols.S
using.set
directive. This is just a convenient way for naming some constant values.
Discuss with your neighbour—what are the advantages of using the functions in the LED library? Are there any disadvantages?
Your task in exercise 1 is to use the functions from the led.S
library to
write three new functions in your main.S
file:
-
blink_dot
, which blinks an led (or leds) for a short period of time (say0x400000
cycles—we’ll call this the “dot length”) and then pauses (delays) for one dot length before returning -
blink_dash
, which blinks the led for three times the dot and then pauses (delays) for one dot length before returning -
blink_space
, which doesn’t blink an LED, but pauses (delays) for seven dot lengths before returning
Each of these function calls will contain nested function calls (i.e. calls to
delay
or other functions) so make sure you use the stack to preserve the link
and argument registers (e.g. with push
and pop
) when necessary.
Once you’ve written those functions, write a main
loop which blinks out the
sequence ... _ _ _
on an endless repeat.
Copy the code into tasks/task-1.S
then commit and push your changes to GitLab.
Task 2: a morse data structure
Now it’s time for the actual morse code part. In morse code, each letter (also
called a codepoint) is encoded using up to five dots/dashes. For example,
the codepoint for the letter B has 4 dots/dashes: _...
while the codepoint for
the letter E is just a single dot .
. You could store this in memory in several
different ways, but one way to do it is to use a data structure which looks like
this:
Each “slot” in the data structure is one full word (32 bits/4 bytes), so the total size of the codepoint data structure is 4*6=24 bytes. The first word is an integer which gives the total number of dots/dashes in the codepoint, while the remaining 5 boxes contain either a 0 (for a dot) or a 1 (for a dash).
What will the address offsets for the different slots be? Remember that each box is one 32-bit word in size, but that memory addresses go up in bytes (8 bits = 1 byte).
Here are a couple of examples… codepoint B (_...
):
and codepoint E (.
)
In each case, the “end” slots in the data structure might be unused, e.g. if the codepoint only has 2 dots/dashes then the final 3 slots will be unused, and it doesn’t matter if they’re 0 or 1. These slots are coloured a darker grey in the diagrams. (If this inefficiency bums you out, you’ll get a chance to fix it in the Extra Tasks section after the main exercises.
Your job for Task 2 is to write a function which takes (as a parameter) the base address (i.e. the address of the first slot) of one of these morse data structures and “blinks out” the codepoint using an LED.
As a hint, here are the steps to follow:
-
pick any character from the morse code table at the start of this lab content
-
store that character in memory (i.e. use the
.data
section) using the morse codepoint data structure shown in the pictures above -
write a
blink_codepoint
function which:- takes the base address of the data structure as an argument in
r0
- reads the “size” of the codepoint from the first slot
- using that size information, loops over the other slots to blink out the
dots/dashes for that codepoint (use the
blink_dot
andblink_dash
functions you wrote earlier) - when it’s finished all the dots/dashes for the codepoint, delays for 3x dot length (the gap between characters)
- takes the base address of the data structure as an argument in
Since the blink_codepoint
function will call a bunch of other functions, make
sure you use the stack to keep track of values you care about. If your program’s
not working properly, make sure you’re not relying on something staying in r0
(or any of the scratch registers) between function calls!
When you start to use functions, the usefulness of the step over vs step
in buttons in the debugger toolbar starts to become clear. When the debugger
is paused at a function call (i.e. a bl
instruction) then step over will
branch, do the things without pausing, and then pause when the function
returns, while step in will follow the branch, allowing you to step
through the called function as well. Sometimes you want to do one, sometimes you
want to do the other, so it’s useful to have both and to choose the right one
for the job.
If you’re confused about what this section is referring to, ask your neighbour
/ tutor to point them out to you.
Write a program which uses the morse data structure and your blink_codepoint
function to blink out the first character of your name on infinite repeat.
Copy the code into tasks/task-2.S
then commit and push your changes to GitLab.
Task 3: ASCII to morse conversion
The final part of today’s lab is to bring it all together to write a program which takes an input string (i.e. a sequence of ASCII characters) and blinks out the morse code for that string.
To save you the trouble of writing out the full morse code alphabet, you can
copy-paste the following code into your editor. It also includes a place to
put the input string (using the .asciz
directive).
.data
input_string:
.asciz "INPUT STRING"
@ to make sure our table starts on a word boundary
.align 2
@ Each entry in the table is 6 words long
@ - The first word is the number of dots and dashes for this entry
@ - The next 5 words are 0 for a dot, 1 for a dash, or padding (value doesn't matter)
@
@ E.g., 'G' is dash-dash-dot. There are 2 extra words to pad the entry size to 6 words
morse_table:
.word 2, 0, 1, 0, 0, 0 @ A
.word 4, 1, 0, 0, 0, 0 @ B
.word 4, 1, 0, 1, 0, 0 @ C
.word 3, 1, 0, 0, 0, 0 @ D
.word 1, 0, 0, 0, 0, 0 @ E
.word 4, 0, 0, 1, 0, 0 @ F
.word 3, 1, 1, 0, 0, 0 @ G
.word 4, 0, 0, 0, 0, 0 @ H
.word 2, 0, 0, 0, 0, 0 @ I
.word 4, 0, 1, 1, 1, 0 @ J
.word 3, 1, 0, 1, 0, 0 @ K
.word 4, 0, 1, 0, 0, 0 @ L
.word 2, 1, 1, 0, 0, 0 @ M
.word 2, 1, 0, 0, 0, 0 @ N
.word 3, 1, 1, 1, 0, 0 @ O
.word 4, 0, 1, 1, 0, 0 @ P
.word 4, 1, 1, 0, 1, 0 @ Q
.word 3, 0, 1, 0, 0, 0 @ R
.word 3, 0, 0, 0, 0, 0 @ S
.word 1, 1, 0, 0, 0, 0 @ T
.word 3, 0, 0, 1, 0, 0 @ U
.word 4, 0, 0, 0, 1, 0 @ V
.word 3, 0, 1, 1, 0, 0 @ W
.word 4, 1, 0, 0, 1, 0 @ X
.word 4, 1, 0, 1, 1, 0 @ Y
.word 4, 1, 1, 0, 0, 0 @ Z
The main addition you’ll need to make to your program to complete this exercise
is a morse_table_index
function which takes a single
ASCII character as input, and returns the
base address of the corresponding codepoint data structure for that character
(which you can then pass to your blink_codepoint
function).
For example, the letter P is ASCII code 80
,
and the offset of the P codepoint data structure in the table above is 15 (P is
the 16th letter) times 24 (size of each codepoint data structure) equals 360 bytes.
So, your main program must:
- loop over the characters in the input string (
ldrb
will be useful here) - if the character is
0
, you’re done - if the character is not
0
:- calculate the address of the morse data structure for that character
- call the
blink_codepoint
function with that base address to blink out the character - jump back to the top of the loop and repeat for the next character
If you like, you can modify your program so that any non-capital letter (i.e.
ASCII value not between 65 and 90
inclusive) will get treated as a space (blink_space
).
Write a program which blinks out your name in morse code.
Copy the code into tasks/task-3.S
then commit and push your changes to GitLab.
Extra Tasks
There are many ways you can extend your morse program. Here are a few things to try (pick which ones interest you—you don’t have to do them in order):
- can you modify your program to accept both lowercase and uppercase ASCII input?
- the current
morse_table
doesn’t include the numbers 0 to 9; can you modify your program to handle these as well? - can you remove the need for the number of dots/dashes in each table entry altogether?
- this is far from the most space-efficient way to store the morse codepoints, can you implement a better scheme?