Category Archives: Hello World Project

The Hello World Project.
Small assembly hacks on old school hardware.

Hello Game Boy

This post has been postponed for a very long time. I started writing it almost 3 years ago, but due to a bug in the code I never got to finish it. When I was done with the Atari 8bit post, I started looking at this code again to find out what was wrong. The issue was found and fixed so now I present you with Hello World for Game Boy.

The Code

The Gameboy CPU is based on the Z80 but have some small but significant changes to the instruction set. Some instructions are removed while other have been added. Most removed instructions have to do with I/O ports, but since the Game Boy have memory mapped I/O, they were not needed anyway. The added instructions makes more sense with the memory mapped I/O, you’ll see why soon enough.

This time I leave out the header and setup code from the blog post. If you want to try the code yourself, grab the source from bitbucket instead of trying to copy it from this post.

During the boot sequence the built-in ROM code will verify the cartridge header with some checksums and display the nintendo logo before jumping  to address $100. The only code at that address is a jump to $150, where we start to go through the code.

The first instruction is di, to disable interrupts before we switch off the LCD display.

According to this page, you should never disable the LCD outside the Vertical Blank period, because that could physically break the Game Boy hardware. So the loop above waits for the VBL before switching of the LCD.

The ldh instruction is one of the special instructions in the Game Boy. What it does is add $ff00 to the number between the parenthesis. So ldh a,($44) will fetch the value from address $ff44 and store it in register a. As it happens, the I/O ports are mapped to the address range $ff00-$ff7e, so this new instruction makes perfect sense. At address $ff44 we can find the current line that is currently sent to the LCD. The line 144 is the start of the VBL period, so subtract 144 from the register value, and jump to the nearest label named – (minus sign) above the current instruction if the result was not zero.

The following block of code will clear the memory used by the tile map that decides what is drawn on screen. Since our code is entered when the nintendo logo is displayed, this little loop will clear it.

The tile map can be found in the range $9800-$9bff, and we use another of the special instructions to clear it. ldd works like a normal ld instruction, except it also decreases the value of hl after the load. So ld (hl),a  will first copy the content of register a to where hl register pair points to, which will be $9bff the first iteration, and then decrease hl to $9bfe. To decide when all the memory is cleared, I grab the high byte of the pointer from h, subtracts $97 and check if the result is zero. If it is, that means the hl register just went from $9800 to $97ff, and the whole tile map has been cleared.

The next step is to copy our tile data to the tile data RAM. In this example, $8000 is the address for the tile data, so we load the hl register with that, and the we setup de register pair with the source address, which is at label gfx defined later in the code. As you may know, the original Game Boy is capable of displaying 4 different colours, but since I’m a lazy being I only made the graphics with one colour. Tiles are 8 by 8 pixels, using 2 bits per pixel giving 16 bytes per tile. The source data is only 1 bit per pixel; 8 bytes per tile and since we have 7 tiles, the first being an empty tile, we setup register b with the number of bytes to copy and use it as a counter.

When the counter is setup, we start the loop by fetching data from the source, then using the ldi instruction to store it to the destination twice. ldi is similar to ldd except that it increases hl instead of decrease after the load is done. This way our 8 bytes per tile source data will be written twice to the destination filling both bitplanes every line with the same source data. Then we increase the source pointer, and decrease the counter and to the looping magic until the counter reaches 0.

Now that the tile data is written, we can start using it by writing tile indices to the tile map memory. As before, we load the tile map address, set up b as a counter but this time we also load a with a 1. This is because we uploaded an empty tile as tile 0. Now we loop to write 1 to 6 to the first 6 entries in the tile map.

Almost done now. The only thing that is left is to enable the LCD again and loop forever. The highest bit we set is to enable the LCD, bit 4 is to select address $8000 to be the tile data address, and bit 0 is set to enable the background graphics, which is what we have been setting up to display our hello world text. The other bits in the LCD controller register are used to enable sprites and chose base address for tile map amongst other things.

When the LCD is enabled again, we halt the CPU to save batteries and jump back to halt again if it would ever wake up from the halt instruction.

Only the graphics data left. I’m starting to feel that this section is more or less implicit. This time I compacted it to hexadecimal syntax, with on tile per data row to reduce the space it took in the source code.

That’s it.

You might be curious of what bug delayed this post for almost three years. Well, it turned out I had missed the fact that you can’t write to the video RAM while it’s accessed by the LCD controller, so my tiles were only partially copied and the graphics were garbled.

Next up, I will probably write a post on the Mega Drive as the code was completed over a year ago. Until then, keep on coding!

Links

  • GbdevWiki, where I found all information I could need
  • WLA-DX, used for assembling and generating a ROM.

Hello Atari XE

Time for a new CPU, and what better way to get starting with the 6502 that with the 8 bit computers from Atari. Sure, you might say that C64 or even NES would be more obvious platforms, but not for me. I am Atari to the bone, and despite C64 being more popular in some parts of the world, I don’t own one. I do own an Atari 130 XE though, and I recently purchased an SIO2SD device from Lotharek which enables me to actually test my code on the real thing.

The 8bit line of Ataris actually have more in common with the Amiga than the 16 bit Ataris. In the mid ’80s, the owners of Atari and Commodore more or less swapped companies, so the hardware designers who built the Atari 8bits later designed the Amiga hardware. And it shows. More about that in a bit.

This example uses the mads assembler that can produce an .xex file which is a loadable binary for the Atari 8bit. And even though the post is called Hello Atari XE, the following code should run on any 8bit Atari with OS/B or later. All XL and XE models should run it.

The code

Lets jump in to the code. We start by telling mads to generate a header for a loadable file, and set the start address to $2000.

The first thing we do when executing is loading the accumulator register, a with 0 and store them at address $d40e and $d400. Both of these memory locations are registers in the ANTIC chip. The first disables sources for the non-maskable interrupts and the later switches off the DMA for graphics. Documentation of the ANTIC can be found on Wikipedia. Also note the start label on the first line. This is later referenced as the entry point of the program.

Before going any further I must add a disclaimer. This code is not system friendly, and uses direct access to hardware registers. Most registers and vectors also have shadow registers that gets copied to the real registers during the system vertical blank (VBL) interrupt.

Since we write directly to the IO registers without using the shadow registers, we must install my own VBL handler to avoid having my values overwritten by the OS. The OS VBL interrupt handler jumps to a function pointed to by a vector at memory location $222. The following code first get to low byte of the address of the vbl_irq label, stores it at address $222 and then stores the high byte at $223.

Similar to how the Amiga copper lists work, the atari Antic chip executes commands in the list to feed the GTIA chip with graphics data. With out display list there will be no graphics so we set the display list address in the ANTIC chip to point to the one defined further down in the code.

Time for another short intermission, where I go through some features of the CPU. On the 6502, memory locations 0 to 255 are the so called zero-page. Memory in this page can be referred to with only one byte, which makes them faster to access. So for code that needs to be really fast, the zero-page is a good place to store data that is accessed a lot.

One other special page of memory is page 1, with addresses between 256 and 511. This space, and only this space is used by the stack. That way the stack “pointer” can be 8 bits instead of 16, and is more of an index into page 1 rather than an actual pointer.

Now it’s time to copy some graphics to the screen memory. To demonstrate a few different graphics modes, I made a function to copy the graphics to a destination buffer. The parameters to the function are stored at location $b0.w in zero-page ram, as well as in the accumulator register.

The destination, as in the screen memory is stored in $b0.w and the width of the screen memory for that mode is passed in register a.

When the parameters are set up we jump to the copy_1bit_gfx sub routine. That routine is explained in detail later.

Now we call the same sub routine again, twice but with other parameters, copying the graphics further down in the graphics buffer.

The first graphics mode we test takes 10 bytes per scanline, and the graphics consists of 7 line, that is why we offset the destination that many bytes in screen_mem. The second mode has 20 bytes per line, hence the second offset. Oh, and screen_mem is a reference to a label further down in the code.

The CTIA/GTIA chip, accessed at memory location $d000 through $d01f is responsible for generating the video output. It is fed with graphics data from the ANTIC and then adds colour and sprites before it’s output to the screen. The only interaction we need to do with the GTIA chip in this example is to set the colours we use. $d016 contains the colour for pixels in the two first modes in the example, $d01a is the background colour for the first two. $d017 and $d018 are used by the third mode, the first being text colour and the second the background.

We set the the background colours to black, represented by the value 0, and the text colour to white, which is $0e. The high nibble of byte is the colour, and the low nibble is the luminance of the colour. So a 0 in the high nibble will produce different shades of grey depending on the low nibble. The reason it’s an $0e and not $0f is because the low bit in the luminance is not used, which limits the total number of available colours to 128.

Once the data is copied and the colours are set, we are ready to enable VBL interrupt and
start the DMA fetch.  The two least significant bits of the DMA control register also determines how wide the so called playfield area is, which is screen memory. We set it to 2 to indicate normal width. Any other value here would force us to change the value passed in register a to copy_1bit_gfx when copying the graphics. Setting bit 5 (value of 32) will enable the DMA fetch of the display list data.

Since we have no interaction and animations, when can just stop here by having an infinite loop.

Time to dive into the VBL interrupt handler we installed way early in the code. It’s not very much to deep into really. Since this code is jumped to by the OS VBI handler, after it saved the registers on the stack, we need to restore them before leaving the handler.

pla pulls (aka pops) a byte from the stack into the a register. tay transfers the content of register a to register y. You can guess what tax does. rti returs from the interrupt, basically pulling the status register and return addresses from the stack.

Finally we get to the sub routine used to copy the graphics data to the screen buffer. It takes two parameters, the destination address which should be written to address $b0.w, and the number of bytes to skip between each line in register a.

The first thing we do is store the row count at address $b4 and setup a pointer to the graphics data at address $b2. This pointer will be used in the copy loop coming up next.

The above code is were we do the actual copying of data. Register x is used as a row counter, and register y is used as an index in both the source and data buffers. The destination pointer was the one passed at address $b0. In the loop, we read a byte from address stored in $b2 + content of register y, and write the same byte to address in $b2 + content of y. With the dey instruction, which stands for Decrease Y we subtract 1 from y.

The bpl instruction branches to the passed address if the result of the dey was positive, so it will jump to the label byte_loop until y counted down to -1. The we use a similar instruction to decrease x, and if x reached zero, we jump to the copy_done label.

If we reach the next part of the code, it means that one of the first 5 rows were copied to the screen buffer. After row 6 is copied, we jump directly to the copy_done label, so the following code will not be run.

Now it’s time to update the destination pointer to the next line. Since we stored away the line width in $b4, we read it back into the accumulator. As the only add instruction in the 6502 also adds the carry bit, we first need to clear the carry bit. Once the carry is taken care of with the clc (CLear Carry) instruction, we add the low byte of the destination address to the accumulator and write back the result. Since the destination is a 16 bit value, and the add in 8 bit, we need to add the carry to the upper byte of the pointer. So we clear the accumulator and add upper byte of the pointer together with the carry bit to the accumulator, and of course write it back.

The next snippet of code does essentially the exact same thing as the code above, except it updates the source pointer with the number of bytes per line in the source, which happens to be 6.

Almost done! The only thing we need to do now jump back to the row_loop label to copy the next row. And of course we have to declare the copy_done label we jump to after the last row was copied. The rts is a simple ReTurn from Subroutine, pulling the return address from the stack, where it gets pushed by the corresponding jsr instruction.

And there we go, all code is done. But there is perhaps the most essential part left…

The display list

The way to tell the ANTIC how to produce graphics data is through the display list. Similar to the Amiga copper list, the list contains instructions that tells the ANTIC what graphics mode to use and where to fetch it’s data, plus some extra bling. With no further ado, here it is.

Most instructions in the display list consists of one byte, but some consist of three. As you can see, the list starts with six bytes with the value $70. That instruction will yield 8 empty lines, so at the top of the screen there will be 48 empty lines.

What comes next is a little bit more interesting. If the low nibble of an instruction is higher than 1, that nibble represents a graphics mode. In this case, mode $9 is a bitmapped 2 color mode with 80 pixels per scan line (in normal width, described above when enabling DMA). The $4 in the high nibble tells the ANTIC to use the next two bytes as the source address to fetch data for the graphics. The next line declares a word value, containing the address of screen_mem.

For every line in the bitmapped mode, we need to tell the display list what mode to use, so we declare 6 more lines of mode 9, since our graphics data is 7 lines high. The following 14 bytes declare 7 lines of mode $b (same as 9, but double the resolution), and 7 lines of mode $f. The last mode being a bit special. It uses another set of palette entries (that we set up earlier), and also does some intentional colour bleeding. As an effect, we get double the resolution of the previous mode, but with some strange colours. In an emulator it doesn’t look good at all, but on a real machine, the bleeding effect does what it’s supposed to and it looks pretty nice.

The last instruction of our display list is $41. The 1 in the low nibble indicates that this is a jump instruction, which means that the following two bytes will be used as the new display list pointer. The $4 in the upper nibble tells the ANTIC to stop serving more data until the next frame.

Data and screen buffer

Not much here but the over-used Hello world text, in 1 bit graphics.

At the very end of the program, we define the label screen_mem. This means that the screen buffer will start at this address when the program is loaded.

Finally there is a run directive for the mads assembler, that tells it to add section in the binary which tells the OS that loaded the application to jump to a specific address, in our case the start label, that we declared at the very top of the program.

That’s it!

Reference and links

PS. Using < and > in the code is really annoying. Either WordPress freaks out or the syntax highlighter does. If you find any &lt; of &gt; in the code, just mentally exchange them for < and >. Thanks.

Hello Gameboy Advance

Last weekend while digging around on one of my USB sticks, I found some old test code I wrote for the GBA over 10 years ago. I though the files were lost forever, so I was quite happy that I found them. The code itself was written in an obscure assembler for Windows, so my first goal was to port it to a more modern tool that preferably could cross assemble on both my PC and Mac. I decided to give the vasm a try. After some initial problems with strange syntax and me not understanding what the error messages meant, I got it to assemble into a binary file. To my surprise, the resulting binary could be loaded into the emulator and it worked the way it should! From there I cleaned up the code, removing stuff I didn’t need for the Hello World Project and added the result to the repository.

The code

This is by far the shortest example to date. The video mode I chose made it very easy to draw graphics on screen, and the setup required is minimal. A short note before we continue; the semantics of the data types are not the same as on the 68k processor. On Arm, a word is 32-bit, while on the 68k it’s 16-bit. This can occasionally cause some confusion, at least it has for me when my focus was not high enough.

At the beginning we tell the assembler to use the arm instruction set (which is 32 bit, as opposed to the thumb instruction set that has 16 bit instructions). Then we tell it that the following code will be executed at position 0x08000000, which is where the GBA ROM entry point is.

The first instruction we run makes a jump (branch) over the ROM header. After the branch, we define the GBA header. I have no idea why the values in the header are what they are or if they are correct, but this is what my old code did, and the emulators I’ve tried haven’t complained yet.

Lets begin by setting the display mode. By setting Mode 4, we get 256 colour chunky graphics mode, i.e. each byte in the graphics memory corresponds to the color at that position in the palette. In Mode 4, only background 4 is used, so we need to enabled that background layer.

Time to setup the palette. Since the graphics only consists of two colours, we read both colours from the palette data in one go and writes them to the palette memory. Each entry in the palette consists of two bytes, with packed RGB data, 5 bits per channel. The least significant bits are the red channel, the next 5 bits are green and finally 5 bits representing the blue channel. The most significant bit is ignored.

Time to update the screen with some pixels. Here we setup two nested loops, on for row and one for column. For each pixel on each row, a color entry is read from the pixel data and written to screen. To optimize a bit, we handle four pixels at a time, and also, it is not possible to write just one byte to the VRAM; doing so will result in the same value being written to the other byte in the other half of the 16-bit location.

There is one thing in the loop above that might be hard to understand without further explanation. I’m thinking about the [r4,#4]! syntax. What is does is access the data pointed to by r4 + 4 bytes; the exclamation mark at the end indicates that the r4 register will be updated with the same offset that was used in the access. So, 4 will be added to r4 after the data access. r3 and r4 will therefore increase after each read/write, so the source and destination addresses will be updated for every iteration in the loop.

We have reached the end of the executable code and here we just enter an infinite loop by jumping to the same location forever.

Now all code is done and we reach the data. First we have two entries of colour data; black and white. These are the values that are written to the palette register.

And finally the graphics data. Same graphics as the other examples, except here we have indexed colour mode, so 0 means colour 0 in palette, and 1 means colour 1 in palette. Simple as that.

That is it. I’m quite fond of the ARM assembly language, it’s quite readable compared to, oh say PowerPC assembler code.

Links

There are many site on the internet dedicated to programming Gameboy and Gameboy Advance. Here are a few links that might be useful. Since the original version of my code was written over 10 years ago, I can’t give links to the resources I used initially, but I hope the links below fill your needs.

Hello Sega Master System

Long overdue, it time for another Hello World hack, and this time it’s for the 8bit console Sega Master System (SMS). Based on the Z80 it will be the first system in this series that is not based on the Motorola M68000 CPU. I learned to code the SMS in 2005, and have release two tiny demo hacks under the alias blind io (you can find them here if really want to see them).

The major difference between the SMS and the Atari ST and Amiga is how the graphics hardware works. Most 8- and 16bit consoles, are based on tiled graphics and sprites. What this meas is that the screen is split into blocks of 8 by 8 pixels, and what tile is displayed one of those blocks is read from a tile map. Sprites are also blocks of pixels, but can be moved freely around the screen. The number of sprites are limited and only a few of them can be displayed on the same scanline due to restrictions in the hardware.

The tool used to assemble the code to a ROM image is called WLA DX. It is a great tool and can assemble code and produce ROM images for several different platforms. In this post, I’m going leave out most of the directives that tell the assembler about the output format and such, and focus on the Z80 code. For the full source, you can go the project on bitbucket.

The code

To start off, we tell the assembler to place the following code at address 0. This is where the Master System fetches starts to execute once the logo has been displayed.

In interrupt mode 1, the cpu jumps to address $38 when an interrupt occurs.
Only the VDP can trigger normal interrupts on the SMS, either every VBL and/or
every scanline. Address $66 is where the CPU jumps when a non-maskable interrupt occurs. This happens to be connected to the Pause button on the Master System. For us, lets just ignore the interrupts and return immediately.

Now we can do the thing we came here to do; show some graphics. First we must setup the VDP, which is the video controller. A bit further down in the listing, there is a section of register values for the video chip, and here is the code that writes these values to the data port connected to the VDP.

otir is an interesting command. It actually does several things. First off, it takes the byte pointed to by hl and output it to the port contained in register c and increases hl by one. Then it decreases the value in register b, and if the result is not 0, the program counter is changed to run the same command again.

Once the VDP configuration is done, we upload the graphics data tiles to the tile RAM inside the VDP. The address in the VDP to where we want to place the tiles is $4020 which we output to VDP control port ($bf). The highest two bits are in reality not part of the address, but tells the VDP that we want to write to the video ram (VRAM). By using offset $20 in VRAM, we leave the first tile empty so all unused blocks on the screen are left black. Note: Running this on real hardware will probably leave some junk on screen since the content of the VRAM might not be initialized to 0.

Then we setup the registers for another otir instruction. The port to write data to the VDP is $be, and it’s quicker to decrease $bf by one than to write the new value to the register.

Now we have uploaded the tile data, but we must still set the colour palette. Same procedure as when we uploaded graphics data, by the colour palette address is $c000 (actually the two highest bytes indicate that we should write to colour memory (CRAM), and the following zeros indicate offset in CRAM), and the number of bytes in the palette is 16.

One thing remains, we must tell the VPD what tiles to display at what position. Since we uploaded the tile data to tile number 1 to 6, we should write values 1 to six in the first 6 entries in the tile map. Tile map entries are 16 bit, so we need to write a 0 every second byte. In the code below, we write 0 to register e and writes that after we have written the value in register a, which we use to count from 1 to 6. The last thing we to is just loop forever.

Here comes the VDP setup data. If you want to know what all these bits and values mean, I suggest you go to the development section of the SMS Power homepage in the link at the bottom of this post.

The palette is quite straight forward. RGB data with 2 bits each. This give us a palette of 64 possible colours. 16 entries. There are actually two different palettes you can use on the Master System, one for background tiles and on for the sprites. Since no sprites are used in this short sample code, we only set the tile palette.

Here comes the tile data. It’s the same graphics as with the Atari and Amiga version, but it has been converted to tiles. The tiles are 8*8 pixels with four bitplanes where the first four bytes of data are the four bitplanes for the first row etc. This means that every tile is 32 bytes of data.

That is it really. For more info see the links below. I apologize for any errors and weird stuff in this blog post, it’s also 5 in the morning and I have written most of this post during the night.

The next installment in this blog series might come sooner than you think. 🙂

Links

  • Z80 CPU User Manual – All you need to know about the Z80
  • SMS POWER! – A goldmine for Sega Master System lovers.
  • MEKA – Emulator with debugging possibilities.
  • WLA DX – Multi platform cross assembler

Hello Amiga OCS

amiga-screenshot

Weeks later than I had originally planned, the Amiga OCS is now greeted with a small Hello World sample. Since this is my first time coding the Amiga, most of the time was spent on reading various hardware documentation on what registers to set and why. As I have somewhat more experience with the Atari ST, I will comment on the difference between the two platforms as well as trying to explain what the Amiga code does.

After some trial and error coding, I decided to write a more system friendly version for the Amiga, to keep the code-build-test cycle as quick as possible. It took way too much time to restart the emulator after every test.

The Amiga and Atari are both based on the Motorola 68000 CPU family, so I am well-versed in the assembly language. The OS and other hardware on the other hand, that was not as easy to get the grips on.

The first thing to do in our little program is to save the current state of all the registers our program will change, so we can restore them to the same state when the program exist. For the address of the default copper lists (more info on the copper later) I load the graphics.library via the OldOpenLibrary system call, and get the addresses for the two lists.

The address to the exec library, which is the core OS module of the Amiga that handles loading of libraries and other basic needs, can be found at position $4 in the memory.

Load “graphics.library” by placing the address to library name string in CPU register a1, and jump to the “OldOpenLibrary” function at offset -408 from the exec library.

The return value will be in d0.

Now the handle to the graphics.library is in register a1, where we can use it as an pointer. Now it’s time to get the address to where we want to save the register data and save the copper list addresses in that memory area. When we are done with that, we can close the graphics library.

Since a2 pointer de-reference used the post-increment syntax, the content of a2 is counted up by the size of data that was written. We continue to use this syntax as we save some other registers that we need to change later on.

Now that we have saved the registers, we can move to the core business – Setup the screen and draw some data to it.

First off, lets only enable DMA for bitplanes and copper. Writing to the DMA and IRQ registers with the highest bit (16) cleared, the functionality corresponding to the set bits will be disabled. To enable functionality, you must write the mask for the functionality you want to set plus the highest bit set. So writing $7fff will disable all, and $ffff will enable all.

Reusing the graphics from ST version, lets copy our graphics to screen. The Amiga screen is a bit different than of the ST. The bitplanes of the Atari are interleave, where on the Amiga, each bitplane is stored in one continuous block of memory.

Set up the copper data. Most of the data in the copper list have been prepared beforehand, but the address the the screen memory must be updated once the code is executing.

Write the address to our copper list data to the co-processor hardware register. Writing to the strobe register reloads the register and starts to execute the list.

Loop until the left mouse button is pressed.

Restore Registers. Like I mentioned before, the highest bit must be set to enable functionality.

Code is done! Now we have to add the graphics and data used by the program. The section statement below tells the assembler to put the data in the “chip memory”, which on the Amiga is memory that can be accessed by the DMA sub-system.

Define lib_name variable as a string containing the name of the graphics library that we load at the beginning of the code. The even statement informs the assembler that the statement should be placed on an even address, since reading 16- or 32-bit values from odd addresses result in an error.

Reuse graphics from the Atari ST version. Since both the Amiga and Atari ST both use bitplane graphics, the data can be reused without changes.

Finally we reach the copper list. The copper, with is kind of a good nick name for co-processor, is a processing unit that can execute a few simple types of instruction. The most common probably being writing data to a hardware register in the memory space of the hardware registers.

Just for fun I added a so called copper bar, or raster bar. To achieve the similar effect on Atari, a lot of more work is involved. Some day I might show you how it’s done on the ST.

The last entry in the copper list tells the copper to wait until it reaches a line that is below the end of the screen, and will therefore not do anything more.

BSS section where we reserve some memory for the registers we save. We also reserve memory enough for one bitplane at a resolution of 320*256.

That’s it. Assemble and run.

Disclaimer: I don’t know enough of the Amiga to say that this code does everything it’s supposed to do. It might have some unknown side effects that I am not aware of. There might also be factual errors in the blog post. Feel free to point them out if you find any.

Links

Here are some useful links if you are interested in learning more and test some Amiga programming yourself.

 

Hello Atari ST

This is the first part of the Hello World Project.

As I’ve been an Atari ST owner since the late 80’s, and still code on my ST occasionally, it was an easy choice to begin with this platform. I could write most of this code without looking through documentation, with a few exceptions, like the OS function code number.

Before I begin to explain the code, I should tell you that this code is not very system friendly. You will not be able to exit the program, all you can do is reboot. After I wrote this code, I have started looking at the Amiga, and realized that it is quite handy to be more system friendly and exit the program in a more kind way. It will save you many reboots of the hardware/emulator while coding. It is very likely that I will return to this code later and fix those issues.

It is quite straight forward to get something draw on the screen on the ST. Just set the address to your allocated screen memory and move some data there. But to access the hardware registers that change the screen address and palette registers, we have to run our code in supervisor mode. This is special mode in the m68k processors where you can execute some privileged instructions and access protected memory areas. I will not go any deeper into the working of the m68k processor family, but if you are interested in programming the Atari, Amiga or any other machine containing an m68k processor, I recommend that you download the Programmers Reference Manual.

One way to run our code in supervisor mode is to call the XBIOS function Supexec, which calls a function in supervisor mode. Since this code is not system friendly, I decided to just stop and loop forever once the function has been completed.

The pea super_run instruction pushes the address to the function super_run to the stack, and the trap #14 instruction calls the XBIOS that then calls our function before it completes.

The first thing to do in our function is to fix the screen memory address and set the screen base registers to point to our allocated memory. At the very end of the source code, there is a SECTION BSS entry, which contains a label screen_mem followed by a ds.b statement. The ds.b declares that we want to allocate space. The ST screen memory always consist of 32000 bytes (unless you are really advanced and do some hardware tricks to achieve overscan, but that is another story).

On an ST, the screen address must be aligned on 256 bytes. Therefore we allocate 256 extra bytes, add 255 to the original address and then clears the lowest byte. If you don’t understand why, grab a pen and a piece of paper and do some calculations yourself.

To set the screen base address, there are two byte sized registers you must write to. As the memory address space of the ST is only 24 bits, and the lowest byte in the address must be 0, it is only the two middle bytes in the 32 bit address that are relevant. The higher of these should be written to $ff8201 and the lower to $ff8203. The last to instructions in the code block below takes care of that. Remember, the m68k is a big endian system.

Lets move on to the colors. The ST palette is stored at the address $ff8240 and contains 16 16-bit  wide (a word in the m68k lingo) registers with color data. Here is a little loop that sets all palette entries to black.

At address $ff8260 lies the screen mode register. By setting this to 0 we set the resolution to low res, which is 320 by 200 pixels.

Start copying data to the screen memory. In lo wres mode, the screen consists of four bitplanes, interleaved with 16 bit words for each plane. If you never used bitplanes, they can be a bit confusing at first, but since many of the old school platforms in one way or the other uses bitplanes, it can be a good idea to read up on them. I will not try to explain how they work just now.

The graphics i made is one bitplane (uses only one color and thus only need one bit per pixel), so we skip three bitplanes when copying data.

The screen base is set, our graphics is copied to the screen and the only thing that remains is to set palette entry 1 to our color of choice. Of course, if we had copied the graphics data to another bitplane than the first we would have to change another palette entry.

On plain ST the palette data consists three bits of data for each RGB channel in the lowest three nibbles. $700 would be red, $070 blue and $007 green.

Define the graphics in a data section. The data contains of binary data, where one bit corresponds to one pixel. 48 pixels wide and 6 pixels high.

And finally we declare some space for the screen memory that we referenced earlier. The difference between the DATA section and BSS section is that everything in the DATA section will be store as data in the binary file, whilst the BSS section will be allocated by the system at load time.  There is a flag in the executable file header that tell the OS whether the memory in the BSS section should be set to zero before the program starts. This is the default behaviour, but if the fast-load flag is set, the memory in the BSS section will be undefined when the program starts.

Any questions or complaints? Leave a comment!

Next up, Amiga OCS. In a future not so far away.

The Hello World Project

Since my time and energy are limited due to my full time day job, most projects I start have a tendency to stall before they really takes off. I’ve been thinking about how I can find the motivation to actually do things when I have time free time. Earlier this week I came up with an idea for the perfect project for me.

The Hello World Project

The idea I had was to make “Hello, World!” programs. That might not sound very hard, does it? No, and it also not very time consuming either. At least not in it’s simplest form. But this is not about making it easy. 

I am going to write Hello World programs for old school computers and consoles, in assembly language. Instead of just calling an os or bios function to write a string to the screen or console, I will access the hardware directly to output my Hello World graphics in whatever form I like, should it be a bitmap or using sprites or whatever else is available.

My interest in old school platforms and the the demoscene is what inspired me to do this. I want to learn about the hardware for the platforms I grew up with and saw all those cool demos on. When this project has started, people, including myself will hopefully have some example code as a starting point for other project, be it demos or games or anything else.

I will try to write blog entries for each platform, and the code will be available on the project page at bitbucket.

I hope to keep the motivation flowing!