Hello C64

The Commodore 64 was one of the first computers I played around with. I never had one myself until recent years, but a friend of mine had one and we good times play games on his little breadbox. So I have been looking forward do learn more about the hardware of the C64.

Before we jump into the code, let me just say that I started with the code a few years ago, but I was stumped by some erratic behaviour of the system. I thought it was strange the way the 64 behaved, and thought it was just a weird machine. But last week when I picked up the code again, I realised there was a silly bug in my code that wrote zeros to more or less random locations in RAM. That is why it behaved strange, and when the bug was taken care of, I learned to enjoy my coding sessions.

The code

This time, I display Hello World in two different graphics mode, switching between them every few seconds. The two modes I used where both bitmap modes, so no character based modes, however the layout of the screen RAM is based on blocks of 8 by 8 pixels, or 4 by 8 in the Multi colour bitmap mode. Please note that the following code have had comments removed so I

Let’s start by checking out the assembly code. For this example code, I decided to run the program as a tokenised BASIC program. This kind of executables are always loaded at address $801, and contains BASIC tokens that will be executed BASIC interpret when you type run on the C64. Since I write all my code in assembly, the first few bytes are representation of the BASIC program 10 sys 2064.

Since we told the interpreter to jump to address $810, we tell the assembler to put this next segment at address $810. The we disable interrupts and start executing out main loop. This part is a bit different from the previous implementations I’ve made in this project. Since I wanted to try out more than one graphics mode, I separated the two put putting them in different subroutines and just jump to them after each other in a loop.

Standard bitmap mode

The first routine we execute is using the Standard bitmap mode, which can display 320 by 200 pixels, using 2 colours in every 8 by 8 block. In this mode, the pixel data are spilt into two areas, one where the 8×8 block data is stored, and in the other area we store what two colours should be displayed in each of the blocks. The Hello World graphics data is pretty boring, so the colour data will be set to the same two colours for all block of the screen. The classic white text on black background.

I decided to store the block data at address $2000 and the colours at $400, which you will see later in the code.

Let’s look at some code.

This bit of code is setting up the data for the graphics, but why are we writing 0 to $d011 and $d020?

On the C64, the memory area $d000 to $d3ff can be mapped to the VIC-II chip, which is the chip that handles the graphics on the 64. At $d011 we find the Control register 1. Together with Control register 2 that we will see more of later, this register controls display mode and hardware scrolling. Setting bit 4 to 0 will disable displaying graphics, setting the whole screen to the border colour. Speaking of the border, address $d020 in where the border colour is stored, and setting it to 0 means border is set to black.

So clearing the two memory addresses means the display will be all black.

When the screen is black, we jump to two different subroutines, one that clears the block data memory at address $2000, and the other will set the colours for all blocks to whatever register a contains. In this case, setting a to $10 means a bit with value 0 in the block data will be black (the low nibble of $10) and a 1 bit will be white (colour 1 in the C64 palette, from the high nibble of $10).

When colour RAM has been set up, we copy a bunch of data (defined later on) into the screen memory at $2000.

Now it’s time to switch to the correct display mode.

The register at address $dd00 is for setting what part of the RAM is visible to the VIC-II. The VIC-II can only access 16K of RAM at a time, and setting this register to 3 will expose addresses from 0 to $3fff. The next few lines configures the VIC-II. As I stated before, register $d011 is Control register 1 and its companion register Control register 2 has the address $d018.

Writing $3b to $d011 will set the vertical scroll to 3, screen height to 25 rows (200 pixels), enable the screen and switch to Bitmap mode. Writing $08 to $d016 sets the horizontal scroll to 0 and setting the screen width to 40 columns (320 pixels).

Register $d018 tells the VIC-II from where the graphics data should be read. 8 in the low nibble sets the block pixel data to be read from $2000, and 1 in the high nibble, as you might have figured out will set the colour data to be read from $400.

This last bit just calls wait_vbl 128 times before returning from the subroutine.

Multicolour bitmap mode

In the multicolour mode pixels have double width, so each block of data consists of 4 by 8 pixels, but in the mode each blocks can contain 4 different colours. The first colour is defined in the background colour register $d021, so this is shared between all blocks. The next two colours are defined in the same way as the colours in the Standard bitmap mode, and the final colour is read from the hardwired Colour RAM at $d800 to $dbe7. In this example we only use two colours.

Much of the code here is the same as for the standard bitmap mode, however I changed the graphics a bit, so it now consists of two rows of data, and also we offset the graphics a bit so it’s drawn on the right side of the screen.

First, we set the border and background colours to grey by writing $b to $d020 and d021. It is followed by setting up the VIC-II registers. The only difference from the standard mode is that we set the bit to enable multi colour mode in $d016. Finally we wait for 128 frames before returning, just like before.

Help routines

Here comes the help routines we called from the code above, for clearing screen, filling colour data and waiting for VBL.

Filling the the area we chose for block colour is pretty straight forward. We actually clear 24 bytes more than we need just to get simpler code.

The clear_screen subroutine clears the blocks of pixel data. The screen data is almost 8KB, and this method of clearing data is not very fast, but the amount of code is relatively small. We’re using memory at $fb and $fc as a base pointer for the screen. The first four instructions just store $2000 at those locations. Following that we setup register x for counting and y as an index register used as an offset from the base address.

The clearing itself uses the (<address>),y addressing mode. The effective address is calculated by taking the 16 bit address at <address> and adding the y register. First we loop 256 times and clear addresses $2000 to $20ff. Then we increase the byte at address $fc, which is the high byte of the screen memory, decrease register x and starts clearing the memory at $2100 unless x reached 0.

Now time for a small anecdote. I started writing this example a few years ago, but I was so confused about the result I got. The C64 seemed to be so random and I could not understand why the code was behaving to erratic. When I picked up the code again a few weeks ago, I realised that I was using the (<address>),x addressing mode. Using register X instead of Y might seem like a small thing, but the effective address is calculated in a totally different way. Instead of reading the address from $fb and $fc, and the adding the y register, it adds the x register to $fb and $fc and uses the address from that location. This meant I was clearing more or less random bytes all over the RAM instead of the contiguous bytes in screen RAM. When I fixed the bug I found that the C64 made much more sense.

One last subroutine before the graphics data.

There are probably other ways to wait for a vertical blank on the C64, but this method was pretty simple. The highest bit in the register at $d011 contains the high bit of a 9 bit counter that is increased every scan line. Since there are more than 255 scanlines, and the value is reset to 0 between frames, we know that when this bit goes from 1 to 0, a new frame is about to be drawn.

First we load register a with $80, which has only the high bit set. Then we loop until the value in the high bit is 1, followed by a similar loop until the high bit is 0 and that’s it.

Graphics data

Not sure how interesting this is, but I’m going to include it just for the sake of it.

And that is all. I enjoyed learning about the C64 hardware and I might return to it soon to make something more advanced than just a Hello World hack.

Coming up next is probably another Atari console.


Hello Sega Mega Drive

After a long absence, I’m back with a long overdue post about the Sega Mega Drive. The code itself was completed over 4 years ago, so my memory of it is a bit hazy. I have however been writing new MegaDrive code recently so this is the perfect time to finally write this post.

The Sega Mega Drive (MD) is based on the Motorola 68000 CPU and its graphics chip (VDP) is a more advanced version of the VDP in the Sega Master System. The main CPU have 64K of memory, and the VDP as an extra 64K internal RAM for storing graphics, sprite positions, color palettes and more.

The Code

The code in this section is not the complete code, and it is moved around a bit compared to the original to make it more coherent when explaining. With that said, let’s dive into it.

This first section starts with defining some constants required later in the code. The VDP is the Video Display Processor, and it is access through two different addresses. Address $c00004 is used for writing control commands to the VDP. When you want write data to the VDP memory, you first write a command to the control port to select what address you want to write to. The actual data is then written to the DATA port at $c00000. More on this later.

The entry point of a normal MD cartridge is located at address $200 (512 in decimal terms). The first 512 bytes of the cartridge contains vectors the CPU use for knowing what code to execute at specific times. For example, at address 4 in the ROM the initial address to execute is stored. It this example that would be $200, since that is where we tell the assembler to locate our code with the org $200 command.

So what does the code at $200 do? When the original Mega Drive was released, some game companies release unlicensed game which made Sega a bit mad. Therefore they made a slight modification to the following hardware revision to require the game write the string ‘SEGA’ to a special address when the console is booted, or the VDP will not display anything on screen. This rendered the old unlicensed game unplayable. But all games released after that required the developers to add this little snippet of code (or something similar) at the start of the game.

What we do is get the console version number from address $a10001. If the version is more than 0, we write ‘SEGA’ to the undocumented hardware register $a14000. This will unlock the VDP to work as expected. Finally we jump to the setup_vdp subroutine.

When the VDP is initialised we return here to upload the graphics and write the tile map to the VDP. The first instruction sets CPU register a0 to $C00000, the address for the VDP data port, which means writing to 4(a0) will write to $C00004, the VDP control port. Except for choosing what address to write to in the VDP, telling the VDP to change the write address after each write is the most common thing you need to do. When writing contiguous data to the VDP RAM, you need to write a 2 to VDP register 15. Why a 2 you might ask? It’s because we write two bytes at a time to the VPD, so the address should be updated by 2. Writing to a register in the VDP is done by setting the high byte of a word to the register number, the low byte to the value you want to set the register to, and then set the highest bit of the word before you write it to the VDP control port. Easy! That’s where the #$8f02 is coming from.

The rest of the code that follows is a loop to clear the VDP RAM. Since the VDP RAM is $10000 bytes long, and we write 4 bytes per move we loop $4000 times. Writing $40000000 to VDP control port tells the VDP to write to address 0 of its main RAM.

Let’s write some color data to the internal color RAM of the VDP. I know I said the VDP has 64K internal RAM, but that is not entirely true. It also has some special memory dedicated to color and scrolling. To tell the VDP we want to write to address 0 of the color RAM, write $C0000000 to the control port. Now we can write the actual color data.

The MD have 4 different palettes of 16 color each. Every tile can use of of the palettes to select what colors to use. For every color there are two bytes to store the RGB values. Three bits are used for each of the RGB elements. The binary representation of the bytes (big endian) would be %0000bbb0ggg0rrr0.

Writing #$00000eee to color RAM address 0 means the first entry is be black, and the second color entry is white.

Now that the VDP RAM is empty, and palette is setup we can copy all the graphics data to the VDP. tile_set is the location in the ROM where the tile graphics is located, and we want to write it to address 0 of the VDP. The copy loop just writes data to the VDP RAM until it reaches the end of the tile data.

When we initialised the VDP (read more about it in the next section) we told the VDP to read the tile map for Scroll A (the MD have to separate planes, called Scroll A and Scroll B, to allow for parallax scrolling) from address $2000 in the VDP RAM. More info on what value to write to the control port to select address can be found here.

The tile map consists of a number of words describing what tile data to use, what palette to use, and other display properties. The lowest 11 bits contains the tile number, which is the only data we are interested in in this code. Our ‘Hello world!’ graphics consists of 6 tiles, which should be displayed in order to make any sense, so we write 1 through 6, combined into 32 bit writes to the address the Scroll A plane is located in the VDP RAM.

When the tile map is written, we are all done so we can stop doing anything useful by halting the CPU until there is an interrupt, and the just halt it again until the end of times. Or the power is cut.

VDP init and data

The VDP have a set of registers that controls how it displays the graphics. This loop reads the configuration from the data defined at memory vdp_regs and writes it to the VDP control port. Simple enough, but the important thing are the values we write to the registers.

The VDP register data contain all necessary configuration of the VDP. To have anything at all display, register 1 (mode register 2) is the most important, since it contains a bit to enable the display altogether. Registers 2, 4, and 6 selects at what address in VDP ram it should fetch graphics data for rendering, so those are also a bit more important than some others.

For a full description of each register you should, read more at the wiki at megadrive.org.

Finally comes the tile data. The tiles consists of 8 by 8 pixels of 16 color indices. That means each byte holds color data for 2 pixels, and in total 32 bytes is required for every tile.

The data below might look a bit strange since it consists of only ones and zeros. It almost seems like binary data, but it is hexadecimal. As we only use palette entry 0 and 1, it wouldn’t make sense to have any other values in the data.

That is is. We are done. Finally. Hopefully it will be less than 3 year until next entry.


Atari hardware project idea

A few days ago, I started thinking about how difficult it’s becoming to find a good monitor for my Atari, especially one that I can easily bring with me on demo parties and such.

Then it occurred to me that my CosmosEx that is mounted inside the ST have a Raspberry Pi inside it, with both an HDMI and a composite video out. What it also have is a high speed connector for camera hardware, capable of streaming HD video in realtime. What I realised is that, with some custom hardware with an FPGA one could get the video signal directly from the shifter and feed it to the Pi in realtime. All it would take is a pretty simple program on the Pi to display the ST:s video output on any screen.

After studying the schematics for the STe and ST, the respective shifter seems to output 4bit (for STe, 3bit on ST) digital RGB values that is then turned into an analog signal. Hooking into these pins should be fairly simple, and together with the pixel clock and the sync signals, an FPGA should be able to convert the color data and send it as an image through the CSI-2 port to the Pi.

My problem is that I know very little of FPGA and hardware development. I would like to be able to build this by myself, but if some pro could help out, that would be awesome.

demicode on twitch!

I recently got the idea to stream some of my programming sessions on twitch, and yesterday I created a twitch account for this purpose. I don’t know if there ever will be a real schedule, but I’ll try to stream as often and regularly as I can.

The current project I’m coding on the stream is a new game for the Atari STe that I got inspired to do the past weekend. The goal is to make a small metroid-vania style game, with pretty simple gameplay. I got the idea after once again finishing the excellent DARC for Sega Master System.

The game will be very short but if people like it and my interest don’t disappear, I will instead make a sequel with maybe more content. I’m keeping my ambition low so I don’t get overwhelmed by expectation. Keep your fingers crossed that it will get finished eventually.

Hello Game Boy

This post has been postponed for a very long time. I started writing it almost 3 years ago, but due to a bug in the code I never got to finish it. When I was done with the Atari 8bit post, I started looking at this code again to find out what was wrong. The issue was found and fixed so now I present you with Hello World for Game Boy.

The Code

The Gameboy CPU is based on the Z80 but have some small but significant changes to the instruction set. Some instructions are removed while other have been added. Most removed instructions have to do with I/O ports, but since the Game Boy have memory mapped I/O, they were not needed anyway. The added instructions makes more sense with the memory mapped I/O, you’ll see why soon enough.

This time I leave out the header and setup code from the blog post. If you want to try the code yourself, grab the source from bitbucket instead of trying to copy it from this post.

During the boot sequence the built-in ROM code will verify the cartridge header with some checksums and display the nintendo logo before jumping  to address $100. The only code at that address is a jump to $150, where we start to go through the code.

The first instruction is di, to disable interrupts before we switch off the LCD display.

According to this page, you should never disable the LCD outside the Vertical Blank period, because that could physically break the Game Boy hardware. So the loop above waits for the VBL before switching of the LCD.

The ldh instruction is one of the special instructions in the Game Boy. What it does is add $ff00 to the number between the parenthesis. So ldh a,($44) will fetch the value from address $ff44 and store it in register a. As it happens, the I/O ports are mapped to the address range $ff00-$ff7e, so this new instruction makes perfect sense. At address $ff44 we can find the current line that is currently sent to the LCD. The line 144 is the start of the VBL period, so subtract 144 from the register value, and jump to the nearest label named – (minus sign) above the current instruction if the result was not zero.

The following block of code will clear the memory used by the tile map that decides what is drawn on screen. Since our code is entered when the nintendo logo is displayed, this little loop will clear it.

The tile map can be found in the range $9800-$9bff, and we use another of the special instructions to clear it. ldd works like a normal ld instruction, except it also decreases the value of hl after the load. So ld (hl),a  will first copy the content of register a to where hl register pair points to, which will be $9bff the first iteration, and then decrease hl to $9bfe. To decide when all the memory is cleared, I grab the high byte of the pointer from h, subtracts $97 and check if the result is zero. If it is, that means the hl register just went from $9800 to $97ff, and the whole tile map has been cleared.

The next step is to copy our tile data to the tile data RAM. In this example, $8000 is the address for the tile data, so we load the hl register with that, and the we setup de register pair with the source address, which is at label gfx defined later in the code. As you may know, the original Game Boy is capable of displaying 4 different colours, but since I’m a lazy being I only made the graphics with one colour. Tiles are 8 by 8 pixels, using 2 bits per pixel giving 16 bytes per tile. The source data is only 1 bit per pixel; 8 bytes per tile and since we have 7 tiles, the first being an empty tile, we setup register b with the number of bytes to copy and use it as a counter.

When the counter is setup, we start the loop by fetching data from the source, then using the ldi instruction to store it to the destination twice. ldi is similar to ldd except that it increases hl instead of decrease after the load is done. This way our 8 bytes per tile source data will be written twice to the destination filling both bitplanes every line with the same source data. Then we increase the source pointer, and decrease the counter and to the looping magic until the counter reaches 0.

Now that the tile data is written, we can start using it by writing tile indices to the tile map memory. As before, we load the tile map address, set up b as a counter but this time we also load a with a 1. This is because we uploaded an empty tile as tile 0. Now we loop to write 1 to 6 to the first 6 entries in the tile map.

Almost done now. The only thing that is left is to enable the LCD again and loop forever. The highest bit we set is to enable the LCD, bit 4 is to select address $8000 to be the tile data address, and bit 0 is set to enable the background graphics, which is what we have been setting up to display our hello world text. The other bits in the LCD controller register are used to enable sprites and chose base address for tile map amongst other things.

When the LCD is enabled again, we halt the CPU to save batteries and jump back to halt again if it would ever wake up from the halt instruction.

Only the graphics data left. I’m starting to feel that this section is more or less implicit. This time I compacted it to hexadecimal syntax, with on tile per data row to reduce the space it took in the source code.

That’s it.

You might be curious of what bug delayed this post for almost three years. Well, it turned out I had missed the fact that you can’t write to the video RAM while it’s accessed by the LCD controller, so my tiles were only partially copied and the graphics were garbled.

Next up, I will probably write a post on the Mega Drive as the code was completed over a year ago. Until then, keep on coding!


  • GbdevWiki, where I found all information I could need
  • WLA-DX, used for assembling and generating a ROM.

Hello Atari XE

Time for a new CPU, and what better way to get starting with the 6502 that with the 8 bit computers from Atari. Sure, you might say that C64 or even NES would be more obvious platforms, but not for me. I am Atari to the bone, and despite C64 being more popular in some parts of the world, I don’t own one. I do own an Atari 130 XE though, and I recently purchased an SIO2SD device from Lotharek which enables me to actually test my code on the real thing.

The 8bit line of Ataris actually have more in common with the Amiga than the 16 bit Ataris. In the mid ’80s, the owners of Atari and Commodore more or less swapped companies, so the hardware designers who built the Atari 8bits later designed the Amiga hardware. And it shows. More about that in a bit.

This example uses the mads assembler that can produce an .xex file which is a loadable binary for the Atari 8bit. And even though the post is called Hello Atari XE, the following code should run on any 8bit Atari with OS/B or later. All XL and XE models should run it.

The code

Lets jump in to the code. We start by telling mads to generate a header for a loadable file, and set the start address to $2000.

The first thing we do when executing is loading the accumulator register, a with 0 and store them at address $d40e and $d400. Both of these memory locations are registers in the ANTIC chip. The first disables sources for the non-maskable interrupts and the later switches off the DMA for graphics. Documentation of the ANTIC can be found on Wikipedia. Also note the start label on the first line. This is later referenced as the entry point of the program.

Before going any further I must add a disclaimer. This code is not system friendly, and uses direct access to hardware registers. Most registers and vectors also have shadow registers that gets copied to the real registers during the system vertical blank (VBL) interrupt.

Since we write directly to the IO registers without using the shadow registers, we must install my own VBL handler to avoid having my values overwritten by the OS. The OS VBL interrupt handler jumps to a function pointed to by a vector at memory location $222. The following code first get to low byte of the address of the vbl_irq label, stores it at address $222 and then stores the high byte at $223.

Similar to how the Amiga copper lists work, the atari Antic chip executes commands in the list to feed the GTIA chip with graphics data. With out display list there will be no graphics so we set the display list address in the ANTIC chip to point to the one defined further down in the code.

Time for another short intermission, where I go through some features of the CPU. On the 6502, memory locations 0 to 255 are the so called zero-page. Memory in this page can be referred to with only one byte, which makes them faster to access. So for code that needs to be really fast, the zero-page is a good place to store data that is accessed a lot.

One other special page of memory is page 1, with addresses between 256 and 511. This space, and only this space is used by the stack. That way the stack “pointer” can be 8 bits instead of 16, and is more of an index into page 1 rather than an actual pointer.

Now it’s time to copy some graphics to the screen memory. To demonstrate a few different graphics modes, I made a function to copy the graphics to a destination buffer. The parameters to the function are stored at location $b0.w in zero-page ram, as well as in the accumulator register.

The destination, as in the screen memory is stored in $b0.w and the width of the screen memory for that mode is passed in register a.

When the parameters are set up we jump to the copy_1bit_gfx sub routine. That routine is explained in detail later.

Now we call the same sub routine again, twice but with other parameters, copying the graphics further down in the graphics buffer.

The first graphics mode we test takes 10 bytes per scanline, and the graphics consists of 7 line, that is why we offset the destination that many bytes in screen_mem. The second mode has 20 bytes per line, hence the second offset. Oh, and screen_mem is a reference to a label further down in the code.

The CTIA/GTIA chip, accessed at memory location $d000 through $d01f is responsible for generating the video output. It is fed with graphics data from the ANTIC and then adds colour and sprites before it’s output to the screen. The only interaction we need to do with the GTIA chip in this example is to set the colours we use. $d016 contains the colour for pixels in the two first modes in the example, $d01a is the background colour for the first two. $d017 and $d018 are used by the third mode, the first being text colour and the second the background.

We set the the background colours to black, represented by the value 0, and the text colour to white, which is $0e. The high nibble of byte is the colour, and the low nibble is the luminance of the colour. So a 0 in the high nibble will produce different shades of grey depending on the low nibble. The reason it’s an $0e and not $0f is because the low bit in the luminance is not used, which limits the total number of available colours to 128.

Once the data is copied and the colours are set, we are ready to enable VBL interrupt and
start the DMA fetch.  The two least significant bits of the DMA control register also determines how wide the so called playfield area is, which is screen memory. We set it to 2 to indicate normal width. Any other value here would force us to change the value passed in register a to copy_1bit_gfx when copying the graphics. Setting bit 5 (value of 32) will enable the DMA fetch of the display list data.

Since we have no interaction and animations, when can just stop here by having an infinite loop.

Time to dive into the VBL interrupt handler we installed way early in the code. It’s not very much to deep into really. Since this code is jumped to by the OS VBI handler, after it saved the registers on the stack, we need to restore them before leaving the handler.

pla pulls (aka pops) a byte from the stack into the a register. tay transfers the content of register a to register y. You can guess what tax does. rti returs from the interrupt, basically pulling the status register and return addresses from the stack.

Finally we get to the sub routine used to copy the graphics data to the screen buffer. It takes two parameters, the destination address which should be written to address $b0.w, and the number of bytes to skip between each line in register a.

The first thing we do is store the row count at address $b4 and setup a pointer to the graphics data at address $b2. This pointer will be used in the copy loop coming up next.

The above code is were we do the actual copying of data. Register x is used as a row counter, and register y is used as an index in both the source and data buffers. The destination pointer was the one passed at address $b0. In the loop, we read a byte from address stored in $b2 + content of register y, and write the same byte to address in $b2 + content of y. With the dey instruction, which stands for Decrease Y we subtract 1 from y.

The bpl instruction branches to the passed address if the result of the dey was positive, so it will jump to the label byte_loop until y counted down to -1. The we use a similar instruction to decrease x, and if x reached zero, we jump to the copy_done label.

If we reach the next part of the code, it means that one of the first 5 rows were copied to the screen buffer. After row 6 is copied, we jump directly to the copy_done label, so the following code will not be run.

Now it’s time to update the destination pointer to the next line. Since we stored away the line width in $b4, we read it back into the accumulator. As the only add instruction in the 6502 also adds the carry bit, we first need to clear the carry bit. Once the carry is taken care of with the clc (CLear Carry) instruction, we add the low byte of the destination address to the accumulator and write back the result. Since the destination is a 16 bit value, and the add in 8 bit, we need to add the carry to the upper byte of the pointer. So we clear the accumulator and add upper byte of the pointer together with the carry bit to the accumulator, and of course write it back.

The next snippet of code does essentially the exact same thing as the code above, except it updates the source pointer with the number of bytes per line in the source, which happens to be 6.

Almost done! The only thing we need to do now jump back to the row_loop label to copy the next row. And of course we have to declare the copy_done label we jump to after the last row was copied. The rts is a simple ReTurn from Subroutine, pulling the return address from the stack, where it gets pushed by the corresponding jsr instruction.

And there we go, all code is done. But there is perhaps the most essential part left…

The display list

The way to tell the ANTIC how to produce graphics data is through the display list. Similar to the Amiga copper list, the list contains instructions that tells the ANTIC what graphics mode to use and where to fetch it’s data, plus some extra bling. With no further ado, here it is.

Most instructions in the display list consists of one byte, but some consist of three. As you can see, the list starts with six bytes with the value $70. That instruction will yield 8 empty lines, so at the top of the screen there will be 48 empty lines.

What comes next is a little bit more interesting. If the low nibble of an instruction is higher than 1, that nibble represents a graphics mode. In this case, mode $9 is a bitmapped 2 color mode with 80 pixels per scan line (in normal width, described above when enabling DMA). The $4 in the high nibble tells the ANTIC to use the next two bytes as the source address to fetch data for the graphics. The next line declares a word value, containing the address of screen_mem.

For every line in the bitmapped mode, we need to tell the display list what mode to use, so we declare 6 more lines of mode 9, since our graphics data is 7 lines high. The following 14 bytes declare 7 lines of mode $b (same as 9, but double the resolution), and 7 lines of mode $f. The last mode being a bit special. It uses another set of palette entries (that we set up earlier), and also does some intentional colour bleeding. As an effect, we get double the resolution of the previous mode, but with some strange colours. In an emulator it doesn’t look good at all, but on a real machine, the bleeding effect does what it’s supposed to and it looks pretty nice.

The last instruction of our display list is $41. The 1 in the low nibble indicates that this is a jump instruction, which means that the following two bytes will be used as the new display list pointer. The $4 in the upper nibble tells the ANTIC to stop serving more data until the next frame.

Data and screen buffer

Not much here but the over-used Hello world text, in 1 bit graphics.

At the very end of the program, we define the label screen_mem. This means that the screen buffer will start at this address when the program is loaded.

Finally there is a run directive for the mads assembler, that tells it to add section in the binary which tells the OS that loaded the application to jump to a specific address, in our case the start label, that we declared at the very top of the program.

That’s it!

Reference and links

PS. Using < and > in the code is really annoying. Either WordPress freaks out or the syntax highlighter does. If you find any &lt; of &gt; in the code, just mentally exchange them for < and >. Thanks.

Summer activities

Hey, I made a couple of Atari ST intros this summer. First up was Bacon, which was released at Sommarhack 2015. I started to plan it when there was still snow outside, but I did almost all of the coding the during the week before the party.

The release was thrown together for High Coast Hack in Härnösand four weeks later. As I had participated with an intro the last to years, I wanted to have something this year as well. So I grabbed some old code and a sound sample, re-coded it for overscan and added an intro pic. As the previous intro, most of the coding was done during the week before the compo. I called it C-c-compo filler.

That’s it for now!

Work in progress

This blog has not seen much action lately, but my coding has been more or less constant. There are code for a few more platforms in my Hello World project, and a few month ago I started writing some code for a SEGA Mega Drive game. It’s no secret that the SMD are one of the platforms that now have a Hello World implementation. I just need to take time to finish up a blog post about it. There are actually two other posts in the pipeline, for the Atari 8bits and Gameboy, even though the Gameboy code is far from working perfectly.

I recently purchased a Skunkboard, which is a development cartridge for the Atari Jaguar. The Jaguar hardware is pretty cool, even though it’s far from bug free, it seems really cool and I can’t wait to get some time free to do some actual work on it. One of my colleagues at work started with a cross platform game project for the Mega Drive and Amiga with a friend of his, and I got the idea to port the game for the Jaguar. I’ll have to wait a while until their code is in a more stable state before I can start porting, but it will be a very interesting project.

Now I better get started with the jumping and colliding code for my Mega Drive game.

Hello Gameboy Advance

Last weekend while digging around on one of my USB sticks, I found some old test code I wrote for the GBA over 10 years ago. I though the files were lost forever, so I was quite happy that I found them. The code itself was written in an obscure assembler for Windows, so my first goal was to port it to a more modern tool that preferably could cross assemble on both my PC and Mac. I decided to give the vasm a try. After some initial problems with strange syntax and me not understanding what the error messages meant, I got it to assemble into a binary file. To my surprise, the resulting binary could be loaded into the emulator and it worked the way it should! From there I cleaned up the code, removing stuff I didn’t need for the Hello World Project and added the result to the repository.

The code

This is by far the shortest example to date. The video mode I chose made it very easy to draw graphics on screen, and the setup required is minimal. A short note before we continue; the semantics of the data types are not the same as on the 68k processor. On Arm, a word is 32-bit, while on the 68k it’s 16-bit. This can occasionally cause some confusion, at least it has for me when my focus was not high enough.

At the beginning we tell the assembler to use the arm instruction set (which is 32 bit, as opposed to the thumb instruction set that has 16 bit instructions). Then we tell it that the following code will be executed at position 0x08000000, which is where the GBA ROM entry point is.

The first instruction we run makes a jump (branch) over the ROM header. After the branch, we define the GBA header. I have no idea why the values in the header are what they are or if they are correct, but this is what my old code did, and the emulators I’ve tried haven’t complained yet.

Lets begin by setting the display mode. By setting Mode 4, we get 256 colour chunky graphics mode, i.e. each byte in the graphics memory corresponds to the color at that position in the palette. In Mode 4, only background 4 is used, so we need to enabled that background layer.

Time to setup the palette. Since the graphics only consists of two colours, we read both colours from the palette data in one go and writes them to the palette memory. Each entry in the palette consists of two bytes, with packed RGB data, 5 bits per channel. The least significant bits are the red channel, the next 5 bits are green and finally 5 bits representing the blue channel. The most significant bit is ignored.

Time to update the screen with some pixels. Here we setup two nested loops, on for row and one for column. For each pixel on each row, a color entry is read from the pixel data and written to screen. To optimize a bit, we handle four pixels at a time, and also, it is not possible to write just one byte to the VRAM; doing so will result in the same value being written to the other byte in the other half of the 16-bit location.

There is one thing in the loop above that might be hard to understand without further explanation. I’m thinking about the [r4,#4]! syntax. What is does is access the data pointed to by r4 + 4 bytes; the exclamation mark at the end indicates that the r4 register will be updated with the same offset that was used in the access. So, 4 will be added to r4 after the data access. r3 and r4 will therefore increase after each read/write, so the source and destination addresses will be updated for every iteration in the loop.

We have reached the end of the executable code and here we just enter an infinite loop by jumping to the same location forever.

Now all code is done and we reach the data. First we have two entries of colour data; black and white. These are the values that are written to the palette register.

And finally the graphics data. Same graphics as the other examples, except here we have indexed colour mode, so 0 means colour 0 in palette, and 1 means colour 1 in palette. Simple as that.

That is it. I’m quite fond of the ARM assembly language, it’s quite readable compared to, oh say PowerPC assembler code.


There are many site on the internet dedicated to programming Gameboy and Gameboy Advance. Here are a few links that might be useful. Since the original version of my code was written over 10 years ago, I can’t give links to the resources I used initially, but I hope the links below fill your needs.

Hello Sega Master System

Long overdue, it time for another Hello World hack, and this time it’s for the 8bit console Sega Master System (SMS). Based on the Z80 it will be the first system in this series that is not based on the Motorola M68000 CPU. I learned to code the SMS in 2005, and have release two tiny demo hacks under the alias blind io (you can find them here if really want to see them).

The major difference between the SMS and the Atari ST and Amiga is how the graphics hardware works. Most 8- and 16bit consoles, are based on tiled graphics and sprites. What this meas is that the screen is split into blocks of 8 by 8 pixels, and what tile is displayed one of those blocks is read from a tile map. Sprites are also blocks of pixels, but can be moved freely around the screen. The number of sprites are limited and only a few of them can be displayed on the same scanline due to restrictions in the hardware.

The tool used to assemble the code to a ROM image is called WLA DX. It is a great tool and can assemble code and produce ROM images for several different platforms. In this post, I’m going leave out most of the directives that tell the assembler about the output format and such, and focus on the Z80 code. For the full source, you can go the project on bitbucket.

The code

To start off, we tell the assembler to place the following code at address 0. This is where the Master System fetches starts to execute once the logo has been displayed.

In interrupt mode 1, the cpu jumps to address $38 when an interrupt occurs.
Only the VDP can trigger normal interrupts on the SMS, either every VBL and/or
every scanline. Address $66 is where the CPU jumps when a non-maskable interrupt occurs. This happens to be connected to the Pause button on the Master System. For us, lets just ignore the interrupts and return immediately.

Now we can do the thing we came here to do; show some graphics. First we must setup the VDP, which is the video controller. A bit further down in the listing, there is a section of register values for the video chip, and here is the code that writes these values to the data port connected to the VDP.

otir is an interesting command. It actually does several things. First off, it takes the byte pointed to by hl and output it to the port contained in register c and increases hl by one. Then it decreases the value in register b, and if the result is not 0, the program counter is changed to run the same command again.

Once the VDP configuration is done, we upload the graphics data tiles to the tile RAM inside the VDP. The address in the VDP to where we want to place the tiles is $4020 which we output to VDP control port ($bf). The highest two bits are in reality not part of the address, but tells the VDP that we want to write to the video ram (VRAM). By using offset $20 in VRAM, we leave the first tile empty so all unused blocks on the screen are left black. Note: Running this on real hardware will probably leave some junk on screen since the content of the VRAM might not be initialized to 0.

Then we setup the registers for another otir instruction. The port to write data to the VDP is $be, and it’s quicker to decrease $bf by one than to write the new value to the register.

Now we have uploaded the tile data, but we must still set the colour palette. Same procedure as when we uploaded graphics data, by the colour palette address is $c000 (actually the two highest bytes indicate that we should write to colour memory (CRAM), and the following zeros indicate offset in CRAM), and the number of bytes in the palette is 16.

One thing remains, we must tell the VPD what tiles to display at what position. Since we uploaded the tile data to tile number 1 to 6, we should write values 1 to six in the first 6 entries in the tile map. Tile map entries are 16 bit, so we need to write a 0 every second byte. In the code below, we write 0 to register e and writes that after we have written the value in register a, which we use to count from 1 to 6. The last thing we to is just loop forever.

Here comes the VDP setup data. If you want to know what all these bits and values mean, I suggest you go to the development section of the SMS Power homepage in the link at the bottom of this post.

The palette is quite straight forward. RGB data with 2 bits each. This give us a palette of 64 possible colours. 16 entries. There are actually two different palettes you can use on the Master System, one for background tiles and on for the sprites. Since no sprites are used in this short sample code, we only set the tile palette.

Here comes the tile data. It’s the same graphics as with the Atari and Amiga version, but it has been converted to tiles. The tiles are 8*8 pixels with four bitplanes where the first four bytes of data are the four bitplanes for the first row etc. This means that every tile is 32 bytes of data.

That is it really. For more info see the links below. I apologize for any errors and weird stuff in this blog post, it’s also 5 in the morning and I have written most of this post during the night.

The next installment in this blog series might come sooner than you think. 🙂


  • Z80 CPU User Manual – All you need to know about the Z80
  • SMS POWER! – A goldmine for Sega Master System lovers.
  • MEKA – Emulator with debugging possibilities.
  • WLA DX – Multi platform cross assembler