## Introduction

This is a tutorial on 65816 ASM used in the SNES, made easy for dumb people to understand (sorta). In case you are wondering, I don't program in this language, so it is possible that I will write something incorrectly in this tutorial. If so, you can e-mail me at tennj@yahoo.com, to complain about how I suck at tutorials. Learning ASM language isn't easy. If you already know a high-level programming language, the process will be a lot more easier. If you already know a low-level programming language, then you may scrap my tutorial as you probably won't need it. (This tutorial is for dumb people remember?) The purpose of this tutorial to for you to learn the basics of ASM.

## CPU Registers / Flags

The 65816 CPU has a set of registers and flags. So what the hell is a register? You may think of a register as sorta a storage space. Like a variable. Each register has it's own purpose. The standard registers are A, X, Y, D, S, PBR, DBR, PC, and P. Now I will explain the usage of each register.

#### A - Accumulator

The accumulator is a general purpose math register. In other words, we can store anything we feel like into it and perform math operations to it. For example, if you wanted to store \$20 into the accumulator, then you can do this:

`lda #\$20` (lda = Load Accumulator with value)

and then if you want to add 5 to it (assuming carry clear {<-more on this later}):

`adc #\$05` (adc = Add to accumulator with carry)

and the accumulator will have the value 25 in it.

The accumulator can either be 8-bits and 16-bit. What I mean by this is when the accumulator is 8-bit, it can only hold values from (0-255), but when it's 16-bit it can hold values from (0-65535). There is absolutely no way of telling when the accumulator is 8 or 16 bits unless you check bit 5 of the P flag (again more on this later). If bit 5 is set, then accumulator is 8-bit, otherwise it's 16-bit.

#### X, Y - Index Registers

The X, Y registers is much like the accumulator. You can store values into it and perform math operations. However, they serve one additional purpose. They are used to index memory locations. The X, and Y registers like the accumulator, can be 8 or 16 bits depending on bit 4 of the P flag.

#### D - Direct Page Register

The direct page register is a pointer that points to a region within the first 64k of memory. This register is used to access memory in direct addressing modes. In direct addressing mode, a 8-bit value (0-255) is added to the direct page address, which will form an effective address. For example, if the direct page register was `\$5000`, then:

`sta \$10` (sta = Store accumulator to address)

will store the accumulator to address `\$5000 + \$10 = \$5010`.

#### S - Stack Register

The stack register points to a region where the stack is stored. So what is a stack? Think of it this way. Suppose you have a table. When you push a book on the stack, you place the book on the table. Suppose you push another book on the table, so you have 2 books on the table. Now when you pop a book, you remove the top book off the table. So what the heck am I saying with all this push and pop crap? We can push values and pop values off and on the stack. Every time we push a value onto the stack, the value is stored at where the stack is pointing to and the stack will decrement. When we pop a value from the stack the value is stored to the destination and the stack increments. Eg. Suppose our stack was at `\$1FFF`, then

`pea \$1000` (pea = push effective address)

we push `\$1000` onto the stack, the stack pointer will be at `\$1FFD` since `\$1000` takes up 2 bytes.

`pla` (pla = pop to accumulator <-assuming 16-bits)

will store the accumulator with \$1000 and the stack register restored to `\$1FFF`.

#### PBR - Program Bank Register

The program banks register hold the current bank the code is running in. Usually, an absolute address passed with a JMP (Jump to location) or JSR (Jump to subroutine) uses the PBR register to form an effective address. eg.

`PBR: \$80` (The current value of PBR)

`jmp \$1C00` will jump to location `\$80:1C00`.

#### DBR - Data Bank Register

Like the PBR register, this register refers to data accesses.

`DBR: \$90`

`lda \$8000` (Load accumulator from address)

loads a value from `\$90:8000`.

#### PC - Program counter

This register hold the address of the location of the current instruction. Along with `PBR`, `PBR:PC` forms the effective address to the current instruction.

#### P - Flag Register

The flag is a 8 bit register that stores the state of the CPU. It can also tell you whether the accumulator and index registers is 8 or 16 bit. The layout of the flag is:

``````_________________________________________________
|  n  |  v  |  m  |  x  |  d  |  i  |  z  |  c  |
-------------------------------------------------
n: Negative
v: Overflow
m: Memory/Accumulator Select
x: Index Register select
d: Decimal
i: Interrupt
z: Zero
c: Carry
``````

Each of the boxes above represent the bits in a byte. Therefore, you must convert the flag register from decimal to binary, then compare it to above to check which flags are set. The carry flag is usually set on error or and unsigned overflow. The `adc` (Add with carry) command performs addition to the accumulator and adds 1 if carry is set. So to perform a pure add, you must clear the carry flag first.

``````clc (Clear carry flag)
``````

So if you were wondering what I was talking about earlier in the tutorial about carry, now you know. The `m` and `x` flag controls whether the accumulator and index register is 8 or 16 bits. When `m` is set, then accumulator is 16 bits. Most assemblers can not detect whether you are working with an accumulator of 8 or 16 bits so it is up to you to keep track of the m flag as you program. Failure to do so will result in a very hard to debug code. We can use the `SEP` (Set P flag) and `REP` (Reset P Flag) to set and clear flags. To make the accumulator 16 bits, we take the binary code `00010000` and convert it to hex, `\$20` or decimal `#32`, and do

``````rep #\$20
``````

which will make the accumulator 16-bits.

Addressing mode is how the processor interprets a command. In other words, we cannot say that:

`lda #\$20`

means the same as

`lda \$20`

The first mode loads `\$20` directly to the accumulator. The second loads a byte from address: `direct page + \$20`. Therefore, a lesson on addressing modes must be taught. From the examples here, the accumulator and index registers are assumed to be 8-bit.

First let's consider a few things in the syntax I use below.

`<exp>` is an expression.

A `<8-bit exp>` is an 8-bit expression. An 8-bit expression is any number between `\$00 - \$FF`.

A `<16-bit exp>` ranges from `\$0000 - \$FFFF`.

A `<24-bit exp>` as you guessed it, from `\$00000 - \$FFFFFF`

A `<dp exp>` is a direct page expression. A `dp` expression is a `<8-bit exp>` but it refers the direct page, and is always written in hex and is 2 digits. The direct page is always calculated by the `<dp exp> + D`.

An `<abs exp>` is a `<16-bit exp>` but is always written in hex and has 4 digits.

A `<long exp>` is a `<24-bit exp>` but is 6 digits and also written in hex.

#### Immediate

`opcode #<8/16-bit exp>`

Immediate address mode is specified with a value. eg.

``````lda #\$FF    ; Loads accumulator with \$FF.
sep #\$30    ; Puts \$30 into P
``````

#### Direct

`opcode \$<8-bit exp>`

The destination is formed by adding the direct page register with the 8-bit address to form an effective address. eg.

``````lda \$20    ; Loads from \$20 + D
lda \$90    ; Loads from \$90 + D
``````

If `D = \$1000`, then it will read a byte (if A is 8-bit) from address `\$1020`.

#### Absolute

`opcode \$<16-bit exp>`

The effective address is formed by `DBR:<16-bit exp>`. eg.

``````DBR: \$88
``````

#### Absolute Long

`opcode \$<24-bit exp>`

The effective address is the expression. eg.

``````lda \$808000    ; Loads a byte from \$80:8000
lda \$FF9090    ; Loads from \$FF:9090
``````

#### Accumulator

`opcode`

The destination is the accumulator. eg.

``````inc    ; Increments the accumulator
``````

#### Implied

`opcode`

The opcode has it's own special function. eg.

``````clc    ; Clears carry flag
inx    ; Increments the X register
``````

#### Direct Indirect Indexed

`opcode (\$<dp exp>), y`

This is where the index registers come into play. 2 bytes are loaded from the direct page address to form a base address that is combined with DBR. Finally y is added to the base address to form the absolute address. eg.

``````DBR: \$80,    D: \$0020,    Y: 0001
Memory dump:
0030:    30 40 23 22 F4 22 23 1C
0038:    23 2D DD F4 FF FF FF FF

lda (\$10), y
``````

First we will calculate the DP address:

``````\$10 + D = \$0030
``````

Then pull 2 bytes from `\$0030, 30 & 40` (and they are reversed for the LSB ordering) to get the address `\$4030`. The address (`\$4030`) is used with DBR to get the base address.

``````DBR:\$4030 -> \$80:4030
``````

``````base + y = \$80:4030 + \$0001 = \$80:4031
``````

Basically, the command loads a byte from \$80:4031 to the accumulator. Usually, this mode is used in a loop where Y is incremented each time to pull a set of data from a memory location. eg2.

``````DBR: \$80,    D: \$0020,    Y: 0001
Memory dump:
0030:    30 40 23 22 F4 22 23 1C
0038:    23 2D DD F4 FF FF FF FF

lda (\$15), y

DP Address        = \$0020 + \$15 = \$ 0035
Base Address      = DBR:<2 bytes from \$0035>
= \$80:2322
``````

P.S. I may make this more complicated that it looks. Most of the time the D register is 0, and therefore takes a lot less time calculating all of this.

#### Direct Indirect Indexed Long

`opcode [\$<dp exp>], y`

This mode is like the previous addressing mode, but the difference is that rather than pulling 2 bytes from the DP address, it pulls 3 bytes to form the effective address. eg.

``````(Same example as last time)
DBR: \$80,    D: \$0020,    Y: 0001
Memory dump:
0030:    30 40 23 22 F4 22 23 1C
0038:    23 2D DD F4 FF FF FF FF

lda [\$10], y

DP Address = \$0020 + \$10 = \$0030
``````

Base Address is form by pulling 3 bytes from `\$0030, 30 40 23` which becomes the address `\$23:4030`. Now we add `Y` to the base address to form the effective address `\$23:4031`.

#### Direct Indexed Indirect

`opcode (\$<dp exp>, x)`

The direct page address is calculated and added with x. 2 bytes from the dp address combined with DBR will form the effective address. eg.

``````DBR: \$80    D: \$0020    X: \$0004
Memory dump:
0020:    FF 00 FF 09 33 33 09 88
0028: 08 76 66 36 D7 23 99 00

lda (\$02, x)
``````

``````DP Address = \$02 + D = \$0022
``````

``````dp address + x = \$0026
``````

2 bytes are pulled from \$0026, (09 88) to become \$8809, and combined with DBR:

``````DBR:\$8809 = \$80:8809
``````

which will be the effective address of where the byte is loaded.

#### Direct Indexed by X

`opcode \$<dp exp>, x`

``````D: \$0020    X: \$0004

lda \$30, x

DP Address = \$30 + \$0020 = \$0050
``````

#### Direct Indexed by Y

`opcode \$<dp exp>, y`

Same as above except we add the Y register instead of X.

#### Absolute Indexed by X

`opcode \$<abs exp>, x`

The absolute expression is added with X and combined with DBR to form the effective address. eg.

``````DBR: \$80    X: \$0001

lda \$8000, x
Effective address = DBR:(\$8000 + x) = \$80:8001
lda \$6988, x
Effective address = DBR:(\$6988 + x) = \$80:6989
``````

#### Absolute Long Indexed by X

`opcode \$<long exp>, x`

The effective address is formed by adding the `<long exp>` with `X`. eg.

``````X: \$0001

lda \$808000, x    ; loads a byte from \$80:8001.
lda \$589112, x    ; loads from \$58:9113
``````

#### Absolute Indexed by Y

`opcode \$<abs exp>, y`

Same as Absolute Indexed by X, except with Y.

#### Program Counter Relative

`opcode \$<8-bit signed exp>`

This addressing mode is only used in branch commands. The `<8-bit signed exp>` is added to PC (program counter) to form the new location of the jump. The `<8-bit signed exp>` can range from (`-128` to `127`). Most assemblers will allow you to enter an `<abs exp>` in which the `+-128` is automatically calculated. eg.

``````bra \$8005    ; branch to location \$8005 as long as it's within the (-128 to 127) range
``````

#### Program Counter Relative Long

`opcode \$<16-bit signed exp>`

Like above, but the range is between (0 to 65535). Only the BRL and PER commands use this.

#### Absolute Indirect

`opcode \$(<abs exp>)`

2 bytes are pulled from the `<abs exp>` to form the effective address. eg.

``````Memory dump:
0000:    90 77 78 00 43 00 00 00
0008:    33 32 12 33 11 11 FF FF

jmp (\$0008)
``````

will first grab 2 bytes from `\$0008 (33 32)`, then jump to the address of `\$3233`.

#### Absolute Indexed Indirect

`opcode \$(<abs exp>, x)`

The `<abs exp>` is added with X, then 2 bytes are pulled from that address to form the new location. eg.

``````X: 0001
Memory dump:
0000:    90 77 78 00 43 00 00 00
0008:    33 32 12 33 11 11 FF FF

jmp (\$0008, x)
Abs Address = \$0008 + X = \$0009
2 bytes from \$0009 = 32 12
\$1232 is the new location.
``````

#### Direct Indirect

`opcode (\$<dp exp>)`

2 bytes are pulled from the direct page address to form the 16-bit address. It is combined with DBR to form a 24-bit effective address. eg.

``````D: 0000    DBR: \$80

Memory Dump:
0040: 50 00 80 00 22 23 33 44
0050: 60 21 21 21 22 55 55 66

lda (\$40)
DP Address = \$40 + D = \$40
16-bit address from \$40 = \$0050
Effective Address = DBR:\$0050 = \$80:0050
``````

#### Direct Indirect Long

`opcode [\$<dp exp>]`

3 bytes are pulled from the direct page address to form an effective address. eg.

``````D: 0000    DBR: \$80

Memory Dump:
0040: 50 00 80 00 22 23 33 44
0050: 60 21 21 21 22 55 55 66

lda [\$40]
DP Address = \$40 + D = \$40
``````

#### Stack

`opcode`

Like implied but affects the stack. eg.

``````pha    ; push Accumulator
pla    ; pop accumulator
``````

#### Stack Relative

`opcode <8-bit exp>, s`

The stack register is added to the `<8-bit exp>` to form the effective address. eg.

``````S: 1FF0

lda 1, s    ; loads a byte from 1 + S = \$1FF1.
``````

#### Stack Relative Indirect Indexed

`opcode (<8-bit exp>, s), y`

The `<8-bit exp>` is added to S and combined with DBR to form the base address. Y is added to the base address to form the effective address. eg.

``````S: \$1FF0    Y: \$0001    DBR: \$80

lda (1, s), y
Base address = DBR:(\$1FF0 + 1) = DBR:\$1FF1 = \$80:1FF1
Effective Address = \$80:1FF1 + Y = \$80:1FF2
``````

#### Block Move

`mvn \$<8-bit exp>,\$<8-bit exp>` `mvp \$<8-bit exp>,\$<8-bit exp>`

This is by far the weirdest instruction I've seen. It basically moves chunks of blocks from one place to another. The first `\$<8-bit exp>` is the bank of the destination. The second is the bank of the source. X is loaded with the 16-bit address of the source, and Y is loaded with the 16-bit address of the destination. A is loaded with how many bytes to transfer. eg.

``````X: ????  Y: ????  A: ????    rep #\$30    ; Make accumulator and index 16-bit
X: ????  Y: ????  A: ????    ldy #\$8000    ; load X with \$8000
X: ????  Y: 8000 A: ????    ldx #\$9000    ; load X with \$9000
X: 9000 Y: 8000 A: ????    lda #\$0005    ; lead A with 5
X: 9000 Y: 8000 A: 0005    mvp \$80, \$A0; block move increment
will transfer 5 bytes from \$80:8000 -> \$A0:9000
``````

## Opcode Reference

Here is a simple explanation of most of the commands in ASM.

Edit: Merged this into the 65816 Reference. See there for more info.

## FAQ's

Q: Since the accumulator is 8/16 bits, how will disassemblers know when the accumulator is 8 or 16 bits?

A: It doesn't. Tracer has an option on there that will attempt to detect the Accumulator and index size. Use the -f switch.

Q: Sometimes the assembler compiles my JMP to a JML instruction. What should I do?

A: Try jmp.w

Q: OK. I got down all the basics. Does this mean I'll be able be able to hack SNES ROMs like all the other groups out there with my newly gotten ASM skills?

A: No. You must learn about the hardware. I don't cover this.

Q: Are you the coolest?

A: Yes.

Q: What about Tiger Claw? Ain't he cool too?

A: Ofcourse.

Written by Jay