Hello Atari XE

May 15, 2016

Time for a new CPU, and what better way to get starting with the 6502 that with the 8 bit computers from Atari. Sure, you might say that C64 or even NES would be more obvious platforms, but not for me. I am Atari to the bone, and despite C64 being more popular in some parts of the world, I don’t own one. I do own an Atari 130 XE though, and I recently purchased an SIO2SD device from Lotharek which enables me to actually test my code on the real thing.

The 8bit line of Ataris actually have more in common with the Amiga than the 16 bit Ataris. In the mid ’80s, the owners of Atari and Commodore more or less swapped companies, so the hardware designers who built the Atari 8bits later designed the Amiga hardware. And it shows. More about that in a bit.

This example uses the mads assembler that can produce an .xex file which is a loadable binary for the Atari 8bit. And even though the post is called Hello Atari XE, the following code should run on any 8bit Atari with OS/B or later. All XL and XE models should run it.

The code

Lets jump in to the code. We start by telling mads to generate a header for a loadable file, and set the start address to $2000.

 	opt h+
	org $2000

The first thing we do when executing is loading the accumulator register, a with 0 and store them at address $d40e and $d400. Both of these memory locations are registers in the ANTIC chip. The first disables sources for the non-maskable interrupts and the later switches off the DMA for graphics. Documentation of the ANTIC can be found on Wikipedia. Also note the start label on the first line. This is later referenced as the entry point of the program.

	start:
	lda #0
	sta $d40e
	sta $d400

Before going any further I must add a disclaimer. This code is not system friendly, and uses direct access to hardware registers. Most registers and vectors also have shadow registers that gets copied to the real registers during the system vertical blank (VBL) interrupt.

Since we write directly to the IO registers without using the shadow registers, we must install my own VBL handler to avoid having my values overwritten by the OS. The OS VBL interrupt handler jumps to a function pointed to by a vector at memory location $222. The following code first get to low byte of the address of the vbl_irq label, stores it at address $222 and then stores the high byte at $223.

	lda #vbl_irq
	sta $223

Similar to how the Amiga copper lists work, the atari Antic chip executes commands in the list to feed the GTIA chip with graphics data. With out display list there will be no graphics so we set the display list address in the ANTIC chip to point to the one defined further down in the code.

	lda #disp_list
	sta $d403

Time for another short intermission, where I go through some features of the CPU. On the 6502, memory locations 0 to 255 are the so called zero-page. Memory in this page can be referred to with only one byte, which makes them faster to access. So for code that needs to be really fast, the zero-page is a good place to store data that is accessed a lot.

One other special page of memory is page 1, with addresses between 256 and 511. This space, and only this space is used by the stack. That way the stack “pointer” can be 8 bits instead of 16, and is more of an index into page 1 rather than an actual pointer.

Now it’s time to copy some graphics to the screen memory. To demonstrate a few different graphics modes, I made a function to copy the graphics to a destination buffer. The parameters to the function are stored at location $b0.w in zero-page ram, as well as in the accumulator register.

The destination, as in the screen memory is stored in $b0.w and the width of the screen memory for that mode is passed in register a.

When the parameters are set up we jump to the copy_1bit_gfx sub routine. That routine is explained in detail later.

	lda	#screen_mem
	sta	$b1
	lda	#10
	jsr	copy_1bit_gfx

Now we call the same sub routine again, twice but with other parameters, copying the graphics further down in the graphics buffer.

The first graphics mode we test takes 10 bytes per scanline, and the graphics consists of 7 line, that is why we offset the destination that many bytes in screen_mem. The second mode has 20 bytes per line, hence the second offset. Oh, and screen_mem is a reference to a label further down in the code.

	lda	#(screen_mem+10*7)
	sta	$b1
	lda	#20
	jsr	copy_1bit_gfx

	lda	#(screen_mem+10*7 + 20*7)
	sta	$b1
	lda	#40
	jsr	copy_1bit_gfx

The CTIA/GTIA chip, accessed at memory location $d000 through $d01f is responsible for generating the video output. It is fed with graphics data from the ANTIC and then adds colour and sprites before it’s output to the screen. The only interaction we need to do with the GTIA chip in this example is to set the colours we use. $d016 contains the colour for pixels in the two first modes in the example, $d01a is the background colour for the first two. $d017 and $d018 are used by the third mode, the first being text colour and the second the background.

We set the the background colours to black, represented by the value 0, and the text colour to white, which is $0e. The high nibble of byte is the colour, and the low nibble is the luminance of the colour. So a 0 in the high nibble will produce different shades of grey depending on the low nibble. The reason it’s an $0e and not $0f is because the low bit in the luminance is not used, which limits the total number of available colours to 128.

	lda	#$0e
	sta	$d016
	sta	$d017

	lda	#$0
	sta	$d018
	sta	$d01a

Once the data is copied and the colours are set, we are ready to enable VBL interrupt and start the DMA fetch. The two least significant bits of the DMA control register also determines how wide the so called playfield area is, which is screen memory. We set it to 2 to indicate normal width. Any other value here would force us to change the value passed in register a to copy_1bit_gfx when copying the graphics. Setting bit 5 (value of 32) will enable the DMA fetch of the display list data.

	lda	#$c0
	sta	$d40e
	lda	#32+2
	sta	$d400

Since we have no interaction and animations, when can just stop here by having an infinite loop.

loop
    jmp loop

Time to dive into the VBL interrupt handler we installed way early in the code. It’s not very much to deep into really. Since this code is jumped to by the OS VBI handler, after it saved the registers on the stack, we need to restore them before leaving the handler.

pla pulls (aka pops) a byte from the stack into the a register. tay transfers the content of register a to register y. You can guess what tax does. rti returs from the interrupt, basically pulling the status register and return addresses from the stack.

vbl_irq:
	pla
	tay
	pla
	tax
	pla
	rti

Finally we get to the sub routine used to copy the graphics data to the screen buffer. It takes two parameters, the destination address which should be written to address $b0.w, and the number of bytes to skip between each line in register a.

copy_1bit_gfx:
	sta	$b4
	lda	#hello_world_1col
	sta	$b3

The first thing we do is store the row count at address $b4 and setup a pointer to the graphics data at address $b2. This pointer will be used in the copy loop coming up next.

	ldx	#6
row_loop:
	ldy	#5
byte_loop:
	lda	($b2),y
	sta	($b0),y
	dey
	bpl	byte_loop
	dex
	beq	copy_done

The above code is were we do the actual copying of data. Register x is used as a row counter, and register y is used as an index in both the source and data buffers. The destination pointer was the one passed at address $b0. In the loop, we read a byte from address stored in $b2 + content of register y, and write the same byte to address in $b2 + content of y. With the dey instruction, which stands for Decrease Y we subtract 1 from y.

The bpl instruction branches to the passed address if the result of the dey was positive, so it will jump to the label byte_loop until y counted down to -1. The we use a similar instruction to decrease x, and if x reached zero, we jump to the copy_done label.

If we reach the next part of the code, it means that one of the first 5 rows were copied to the screen buffer. After row 6 is copied, we jump directly to the copy_done label, so the following code will not be run.

	lda	$b4
	clc
	adc	$b0
	sta	$b0
	lda	#0
	adc	$b1
	sta	$b1

Now it’s time to update the destination pointer to the next line. Since we stored away the line width in $b4, we read it back into the accumulator. As the only add instruction in the 6502 also adds the carry bit, we first need to clear the carry bit. Once the carry is taken care of with the clc (CLear Carry) instruction, we add the low byte of the destination address to the accumulator and write back the result. Since the destination is a 16 bit value, and the add in 8 bit, we need to add the carry to the upper byte of the pointer. So we clear the accumulator and add upper byte of the pointer together with the carry bit to the accumulator, and of course write it back.

The next snippet of code does essentially the exact same thing as the code above, except it updates the source pointer with the number of bytes per line in the source, which happens to be 6.

	lda	#6
	clc
	adc	$b2
	sta	$b2
	lda	#0
	adc	$b3
	sta	$b3

Almost done! The only thing we need to do now jump back to the row_loop label to copy the next row. And of course we have to declare the copy_done label we jump to after the last row was copied. The rts is a simple ReTurn from Subroutine, pulling the return address from the stack, where it gets pushed by the corresponding jsr instruction.

	jmp	row_loop
copy_done:
	rts

And there we go, all code is done. But there is perhaps the most essential part left…

The display list

The way to tell the ANTIC how to produce graphics data is through the display list. Similar to the Amiga copper list, the list contains instructions that tells the ANTIC what graphics mode to use and where to fetch it’s data, plus some extra bling. With no further ado, here it is.

disp_list:
	.db $70,$70,$70,$70,$70,$70
	.db $49
	.dw screen_mem
	.db $9, $9, $9, $9, $9, $9
	.db $b, $b, $b, $b, $b, $b, $b
	.db $f, $f, $f, $f, $f, $f, $f
	.db $41
	.dw disp_list

Most instructions in the display list consists of one byte, but some consist of three. As you can see, the list starts with six bytes with the value $70. That instruction will yield 8 empty lines, so at the top of the screen there will be 48 empty lines.

What comes next is a little bit more interesting. If the low nibble of an instruction is higher than 1, that nibble represents a graphics mode. In this case, mode $9 is a bitmapped 2 color mode with 80 pixels per scan line (in normal width, described above when enabling DMA). The $4 in the high nibble tells the ANTIC to use the next two bytes as the source address to fetch data for the graphics. The next line declares a word value, containing the address of screen_mem.

For every line in the bitmapped mode, we need to tell the display list what mode to use, so we declare 6 more lines of mode 9, since our graphics data is 7 lines high. The following 14 bytes declare 7 lines of mode $b (same as 9, but double the resolution), and 7 lines of mode $f. The last mode being a bit special. It uses another set of palette entries (that we set up earlier), and also does some intentional colour bleeding. As an effect, we get double the resolution of the previous mode, but with some strange colours. In an emulator it doesn’t look good at all, but on a real machine, the bleeding effect does what it’s supposed to and it looks pretty nice.

The last instruction of our display list is $41. The 1 in the low nibble indicates that this is a jump instruction, which means that the following two bytes will be used as the new display list pointer. The $4 in the upper nibble tells the ANTIC to stop serving more data until the next frame.

Data and screen buffer

Not much here but the over-used Hello world text, in 1 bit graphics.

hello_world_1col:
    .db    %01000100,%00000101,%00000001,%00010000,%00000001,%00001010
    .db    %01000100,%11100101,%00110001,%00010011,%00011001,%00111010
    .db    %01000101,%00010101,%01001001,%00010100,%10100101,%01001010
    .db    %01111101,%11110101,%01001001,%01010100,%10100001,%01001010
    .db    %01000101,%00000101,%01001001,%01010100,%10100001,%01001000
    .db    %01000100,%11110101,%00110000,%10100011,%00100001,%00111010

At the very end of the program, we define the label screen_mem. This means that the screen buffer will start at this address when the program is loaded.

screen_mem:

Finally there is a run directive for the mads assembler, that tells it to add section in the binary which tells the OS that loaded the application to jump to a specific address, in our case the start label, that we declared at the very top of the program.

    run start

That’s it!

PS. It seems that the conversion from Wordpress to Hugo removed some < and > characters from the code, so it might not work. The source on gitbucket should still work though.

Reference and links

MADS Cross assembler I used. WUDSN hosts binaries Mac OS X and linux
ANTIC chip on Wikipedia
CTIA/GTIA on Wikipedia
6502 instruction set