Monthly Archives: May 2016

demicode on twitch!

I recently got the idea to stream some of my programming sessions on twitch, and yesterday I created a twitch account for this purpose. I don’t know if there ever will be a real schedule, but I’ll try to stream as often and regularly as I can.

The current project I’m coding on the stream is a new game for the Atari STe that I got inspired to do the past weekend. The goal is to make a small metroid-vania style game, with pretty simple gameplay. I got the idea after once again finishing the excellent DARC for Sega Master System.

The game will be very short but if people like it and my interest don’t disappear, I will instead make a sequel with maybe more content. I’m keeping my ambition low so I don’t get overwhelmed by expectation. Keep your fingers crossed that it will get finished eventually.

Hello Game Boy

This post has been postponed for a very long time. I started writing it almost 3 years ago, but due to a bug in the code I never got to finish it. When I was done with the Atari 8bit post, I started looking at this code again to find out what was wrong. The issue was found and fixed so now I present you with Hello World for Game Boy.

The Code

The Gameboy CPU is based on the Z80 but have some small but significant changes to the instruction set. Some instructions are removed while other have been added. Most removed instructions have to do with I/O ports, but since the Game Boy have memory mapped I/O, they were not needed anyway. The added instructions makes more sense with the memory mapped I/O, you’ll see why soon enough.

This time I leave out the header and setup code from the blog post. If you want to try the code yourself, grab the source from bitbucket instead of trying to copy it from this post.

During the boot sequence the built-in ROM code will verify the cartridge header with some checksums and display the nintendo logo before jumping  to address $100. The only code at that address is a jump to $150, where we start to go through the code.

The first instruction is di, to disable interrupts before we switch off the LCD display.

According to this page, you should never disable the LCD outside the Vertical Blank period, because that could physically break the Game Boy hardware. So the loop above waits for the VBL before switching of the LCD.

The ldh instruction is one of the special instructions in the Game Boy. What it does is add $ff00 to the number between the parenthesis. So ldh a,($44) will fetch the value from address $ff44 and store it in register a. As it happens, the I/O ports are mapped to the address range $ff00-$ff7e, so this new instruction makes perfect sense. At address $ff44 we can find the current line that is currently sent to the LCD. The line 144 is the start of the VBL period, so subtract 144 from the register value, and jump to the nearest label named – (minus sign) above the current instruction if the result was not zero.

The following block of code will clear the memory used by the tile map that decides what is drawn on screen. Since our code is entered when the nintendo logo is displayed, this little loop will clear it.

The tile map can be found in the range $9800-$9bff, and we use another of the special instructions to clear it. ldd works like a normal ld instruction, except it also decreases the value of hl after the load. So ld (hl),a  will first copy the content of register a to where hl register pair points to, which will be $9bff the first iteration, and then decrease hl to $9bfe. To decide when all the memory is cleared, I grab the high byte of the pointer from h, subtracts $97 and check if the result is zero. If it is, that means the hl register just went from $9800 to $97ff, and the whole tile map has been cleared.

The next step is to copy our tile data to the tile data RAM. In this example, $8000 is the address for the tile data, so we load the hl register with that, and the we setup de register pair with the source address, which is at label gfx defined later in the code. As you may know, the original Game Boy is capable of displaying 4 different colours, but since I’m a lazy being I only made the graphics with one colour. Tiles are 8 by 8 pixels, using 2 bits per pixel giving 16 bytes per tile. The source data is only 1 bit per pixel; 8 bytes per tile and since we have 7 tiles, the first being an empty tile, we setup register b with the number of bytes to copy and use it as a counter.

When the counter is setup, we start the loop by fetching data from the source, then using the ldi instruction to store it to the destination twice. ldi is similar to ldd except that it increases hl instead of decrease after the load is done. This way our 8 bytes per tile source data will be written twice to the destination filling both bitplanes every line with the same source data. Then we increase the source pointer, and decrease the counter and to the looping magic until the counter reaches 0.

Now that the tile data is written, we can start using it by writing tile indices to the tile map memory. As before, we load the tile map address, set up b as a counter but this time we also load a with a 1. This is because we uploaded an empty tile as tile 0. Now we loop to write 1 to 6 to the first 6 entries in the tile map.

Almost done now. The only thing that is left is to enable the LCD again and loop forever. The highest bit we set is to enable the LCD, bit 4 is to select address $8000 to be the tile data address, and bit 0 is set to enable the background graphics, which is what we have been setting up to display our hello world text. The other bits in the LCD controller register are used to enable sprites and chose base address for tile map amongst other things.

When the LCD is enabled again, we halt the CPU to save batteries and jump back to halt again if it would ever wake up from the halt instruction.

Only the graphics data left. I’m starting to feel that this section is more or less implicit. This time I compacted it to hexadecimal syntax, with on tile per data row to reduce the space it took in the source code.

That’s it.

You might be curious of what bug delayed this post for almost three years. Well, it turned out I had missed the fact that you can’t write to the video RAM while it’s accessed by the LCD controller, so my tiles were only partially copied and the graphics were garbled.

Next up, I will probably write a post on the Mega Drive as the code was completed over a year ago. Until then, keep on coding!


  • GbdevWiki, where I found all information I could need
  • WLA-DX, used for assembling and generating a ROM.

Hello Atari XE

Time for a new CPU, and what better way to get starting with the 6502 that with the 8 bit computers from Atari. Sure, you might say that C64 or even NES would be more obvious platforms, but not for me. I am Atari to the bone, and despite C64 being more popular in some parts of the world, I don’t own one. I do own an Atari 130 XE though, and I recently purchased an SIO2SD device from Lotharek which enables me to actually test my code on the real thing.

The 8bit line of Ataris actually have more in common with the Amiga than the 16 bit Ataris. In the mid ’80s, the owners of Atari and Commodore more or less swapped companies, so the hardware designers who built the Atari 8bits later designed the Amiga hardware. And it shows. More about that in a bit.

This example uses the mads assembler that can produce an .xex file which is a loadable binary for the Atari 8bit. And even though the post is called Hello Atari XE, the following code should run on any 8bit Atari with OS/B or later. All XL and XE models should run it.

The code

Lets jump in to the code. We start by telling mads to generate a header for a loadable file, and set the start address to $2000.

The first thing we do when executing is loading the accumulator register, a with 0 and store them at address $d40e and $d400. Both of these memory locations are registers in the ANTIC chip. The first disables sources for the non-maskable interrupts and the later switches off the DMA for graphics. Documentation of the ANTIC can be found on Wikipedia. Also note the start label on the first line. This is later referenced as the entry point of the program.

Before going any further I must add a disclaimer. This code is not system friendly, and uses direct access to hardware registers. Most registers and vectors also have shadow registers that gets copied to the real registers during the system vertical blank (VBL) interrupt.

Since we write directly to the IO registers without using the shadow registers, we must install my own VBL handler to avoid having my values overwritten by the OS. The OS VBL interrupt handler jumps to a function pointed to by a vector at memory location $222. The following code first get to low byte of the address of the vbl_irq label, stores it at address $222 and then stores the high byte at $223.

Similar to how the Amiga copper lists work, the atari Antic chip executes commands in the list to feed the GTIA chip with graphics data. With out display list there will be no graphics so we set the display list address in the ANTIC chip to point to the one defined further down in the code.

Time for another short intermission, where I go through some features of the CPU. On the 6502, memory locations 0 to 255 are the so called zero-page. Memory in this page can be referred to with only one byte, which makes them faster to access. So for code that needs to be really fast, the zero-page is a good place to store data that is accessed a lot.

One other special page of memory is page 1, with addresses between 256 and 511. This space, and only this space is used by the stack. That way the stack “pointer” can be 8 bits instead of 16, and is more of an index into page 1 rather than an actual pointer.

Now it’s time to copy some graphics to the screen memory. To demonstrate a few different graphics modes, I made a function to copy the graphics to a destination buffer. The parameters to the function are stored at location $b0.w in zero-page ram, as well as in the accumulator register.

The destination, as in the screen memory is stored in $b0.w and the width of the screen memory for that mode is passed in register a.

When the parameters are set up we jump to the copy_1bit_gfx sub routine. That routine is explained in detail later.

Now we call the same sub routine again, twice but with other parameters, copying the graphics further down in the graphics buffer.

The first graphics mode we test takes 10 bytes per scanline, and the graphics consists of 7 line, that is why we offset the destination that many bytes in screen_mem. The second mode has 20 bytes per line, hence the second offset. Oh, and screen_mem is a reference to a label further down in the code.

The CTIA/GTIA chip, accessed at memory location $d000 through $d01f is responsible for generating the video output. It is fed with graphics data from the ANTIC and then adds colour and sprites before it’s output to the screen. The only interaction we need to do with the GTIA chip in this example is to set the colours we use. $d016 contains the colour for pixels in the two first modes in the example, $d01a is the background colour for the first two. $d017 and $d018 are used by the third mode, the first being text colour and the second the background.

We set the the background colours to black, represented by the value 0, and the text colour to white, which is $0e. The high nibble of byte is the colour, and the low nibble is the luminance of the colour. So a 0 in the high nibble will produce different shades of grey depending on the low nibble. The reason it’s an $0e and not $0f is because the low bit in the luminance is not used, which limits the total number of available colours to 128.

Once the data is copied and the colours are set, we are ready to enable VBL interrupt and
start the DMA fetch.  The two least significant bits of the DMA control register also determines how wide the so called playfield area is, which is screen memory. We set it to 2 to indicate normal width. Any other value here would force us to change the value passed in register a to copy_1bit_gfx when copying the graphics. Setting bit 5 (value of 32) will enable the DMA fetch of the display list data.

Since we have no interaction and animations, when can just stop here by having an infinite loop.

Time to dive into the VBL interrupt handler we installed way early in the code. It’s not very much to deep into really. Since this code is jumped to by the OS VBI handler, after it saved the registers on the stack, we need to restore them before leaving the handler.

pla pulls (aka pops) a byte from the stack into the a register. tay transfers the content of register a to register y. You can guess what tax does. rti returs from the interrupt, basically pulling the status register and return addresses from the stack.

Finally we get to the sub routine used to copy the graphics data to the screen buffer. It takes two parameters, the destination address which should be written to address $b0.w, and the number of bytes to skip between each line in register a.

The first thing we do is store the row count at address $b4 and setup a pointer to the graphics data at address $b2. This pointer will be used in the copy loop coming up next.

The above code is were we do the actual copying of data. Register x is used as a row counter, and register y is used as an index in both the source and data buffers. The destination pointer was the one passed at address $b0. In the loop, we read a byte from address stored in $b2 + content of register y, and write the same byte to address in $b2 + content of y. With the dey instruction, which stands for Decrease Y we subtract 1 from y.

The bpl instruction branches to the passed address if the result of the dey was positive, so it will jump to the label byte_loop until y counted down to -1. The we use a similar instruction to decrease x, and if x reached zero, we jump to the copy_done label.

If we reach the next part of the code, it means that one of the first 5 rows were copied to the screen buffer. After row 6 is copied, we jump directly to the copy_done label, so the following code will not be run.

Now it’s time to update the destination pointer to the next line. Since we stored away the line width in $b4, we read it back into the accumulator. As the only add instruction in the 6502 also adds the carry bit, we first need to clear the carry bit. Once the carry is taken care of with the clc (CLear Carry) instruction, we add the low byte of the destination address to the accumulator and write back the result. Since the destination is a 16 bit value, and the add in 8 bit, we need to add the carry to the upper byte of the pointer. So we clear the accumulator and add upper byte of the pointer together with the carry bit to the accumulator, and of course write it back.

The next snippet of code does essentially the exact same thing as the code above, except it updates the source pointer with the number of bytes per line in the source, which happens to be 6.

Almost done! The only thing we need to do now jump back to the row_loop label to copy the next row. And of course we have to declare the copy_done label we jump to after the last row was copied. The rts is a simple ReTurn from Subroutine, pulling the return address from the stack, where it gets pushed by the corresponding jsr instruction.

And there we go, all code is done. But there is perhaps the most essential part left…

The display list

The way to tell the ANTIC how to produce graphics data is through the display list. Similar to the Amiga copper list, the list contains instructions that tells the ANTIC what graphics mode to use and where to fetch it’s data, plus some extra bling. With no further ado, here it is.

Most instructions in the display list consists of one byte, but some consist of three. As you can see, the list starts with six bytes with the value $70. That instruction will yield 8 empty lines, so at the top of the screen there will be 48 empty lines.

What comes next is a little bit more interesting. If the low nibble of an instruction is higher than 1, that nibble represents a graphics mode. In this case, mode $9 is a bitmapped 2 color mode with 80 pixels per scan line (in normal width, described above when enabling DMA). The $4 in the high nibble tells the ANTIC to use the next two bytes as the source address to fetch data for the graphics. The next line declares a word value, containing the address of screen_mem.

For every line in the bitmapped mode, we need to tell the display list what mode to use, so we declare 6 more lines of mode 9, since our graphics data is 7 lines high. The following 14 bytes declare 7 lines of mode $b (same as 9, but double the resolution), and 7 lines of mode $f. The last mode being a bit special. It uses another set of palette entries (that we set up earlier), and also does some intentional colour bleeding. As an effect, we get double the resolution of the previous mode, but with some strange colours. In an emulator it doesn’t look good at all, but on a real machine, the bleeding effect does what it’s supposed to and it looks pretty nice.

The last instruction of our display list is $41. The 1 in the low nibble indicates that this is a jump instruction, which means that the following two bytes will be used as the new display list pointer. The $4 in the upper nibble tells the ANTIC to stop serving more data until the next frame.

Data and screen buffer

Not much here but the over-used Hello world text, in 1 bit graphics.

At the very end of the program, we define the label screen_mem. This means that the screen buffer will start at this address when the program is loaded.

Finally there is a run directive for the mads assembler, that tells it to add section in the binary which tells the OS that loaded the application to jump to a specific address, in our case the start label, that we declared at the very top of the program.

That’s it!

Reference and links

PS. Using < and > in the code is really annoying. Either WordPress freaks out or the syntax highlighter does. If you find any &lt; of &gt; in the code, just mentally exchange them for < and >. Thanks.