SNES Development
ASM Tutorial Part 2

Are you familiar with all the previous lessons yet? If so, that’s great.. with all of that knowledge, you can do a lot of cool stuff - but you can always get better at ASM and make awesome stuff by learning more. So here’s Part 2 of the tutorial, which for now only explains a bit of intermediate stuff. Don’t worry, everything will be explained easily.

Lesson 6: Indexing

So… you know you can write to the accumulator or the ‘A’ register, right? Well, guess what? There’s an X and Y register you can work with, too. Isn’t that cool? You have 3 registers you can use… and the X and Y registers work similar to the A register.

LDA         LDX          LDY
STA         STX          STY
CMP         CPX          CPY
INC         INX          INY
DEC         DEX          DEY

Those are the X and Y equivalents of using some opcodes. All of them work in the same way as A, for example:

LDA #$38
STA $00

Can be:

LDX #$38
STX $00

Same goes for Y. You just need to know that INX / DEX / INY / DEY work a bit differently.

When using A, you can directly increment a RAM Address. Like this: INC $0DBF. However, you can’t do INX $0DBF or INY $0DBF as these commands do not exist. The best method would be to use it like this:

LDX $0DBF
INX
STX $0DBF

And the same goes for the Y register. And note that branching commands work in the same way as A for X and Y, for example:

LDY $00    ;if $00 = 1 ..
CPY #$01
BEQ Return ; Return.
LDY #$01
STY $19    ; I'm sure you know what this does.
Return:
RTL

…but now you might be thinking? What’s the use of the X and Y registers if there’s already a ‘A’ one? And also, with that INX code I made above, you also might be thinking that it just wastes more bytes. Well, those aren’t the main uses of X and Y. Instead, X and Y are mainly used for tables and indexing.

What’s a table? It’s basically a label which a bunch of bytes. It’s set up like this:

Table:
db $04,$08,$0B,$1C

You write a label/symbol normally, but afterwards, you write a ”db” followed by a bunch of values separated by a comma. The values don’t need a # in a table. Here’s a table with 6 bytes:

Table:
db $3F,$1A,$2C,$01,$08,$24
;   00  01  02  03  04  05

It’s 6 bytes long, but you normally regard the first byte to be 00. Also note that in a table, you can write whatever values you want.. I just wrote those random ones. Just note that they need to be set up in the proper method:

Symbol name in front of the table, dollar sign in front of each number (no # sign or spaces, ever), comma separating each number and no comma at the end. If you have set up a table like this:

Table:
db #$08, #$06, #$08

That’s incorrect. You don’t ever need to put a # or a space.

Anyway, so you’ve made a table? Big deal, where’s the X and Y in it? Well, now that we’ve made one.. we actually need to utilize it.

To use it, you need to load a RAM Address indexed by X or Y. Suppose we want to make the star timer have a different timer depending on Mario’s power-up state. That would be simple to create using a table.

From the beginning of our tutorial,

00 - Small Mario
01 - Big Mario
02 - Caped Mario
03 - Fiery Mario

And $1490 controls the star timer. #$FF is the maximum value it can be. With your current knowledge, a code like this will be made:

    LDA $19
    CMP #$00
    BEQ TimerA   ; for small mario
    CMP #$01
    BEQ TimerB   ; for big mario
    CMP #$02
    BEQ TimerC   ; for caped mario
    CMP #$03
    BEQ TimerD
    RTL
TimerA:
    LDA #$1C     ; load a timer 
    BRA SetTimer ; and branch to set it.. for small mario.
TimerB:
    LDA #$28
    BRA SetTimer ; for big Mario
TimerC:
    LDA #$38
    BRA SetTimer ; for caped Mario
TimerD:
    LDA #$56
SetTimer         ; we don't need a BRA here, because the label and code is already present.
    STA $1490    ; store to star timer.
    RTL          ; and return.

..long isn’t it? Not only that, but it’s a garbage code which can be optimized using tables and indexing.

Firstly, we need to load Mario’s power-up status in the X or Y register. I will be using X for mine:

LDX $19

Now, we need to load a table, which is dependent on “X”, which is Mario’s power-up. Loading a table is simple:

Table:
db $08,$1C,$24,$48

LDX $19
LDA Table

But the thing is, we want to load a different value depending on $19, so we use different bytes in a table. Remember how I said the first byte is 00, second is 01, and so on? The bytes in our table can control our star timers:

Table:
db $08,$1C,$24,$48
;   00  01  02  03

If we’re using $19 here (I’ll explain how you make the table use $19 later), we’re loading #$08 for small Mario, #$1C for Big Mario, #$24 for Caped Mario and #$48 for Fire Mario. If there was a fifth value, we could extend our table by more byte for that value, but there isn’t, so we’ll leave this table being 04 bytes long.

Now that we’ve set up the table, we need to make it use $19. I’ve already loaded that into X in my code, and I can make the table easily use X with this code:

Table:
db $08,$1C,$24,$48

LDX $19
LDA Table,x ; load the table, indexed by power-up ($19)

See how this works? To get a table indexed by X, you use a comma after the table name, followed by an X. Now this will make the table use values loaded into X, which is the power-up state.

Using Y is similar, you just need to replace LDA Table,x with LDA Table,y and make sure that you’ve loaded $19 into Y, not X!

So… now that we’ve made the table use values from $19 by indexing it with X, we need to store those values to the star timer, $1490. That’s very simple; just add an STA $1490 after loading the table. The table does the rest of the work:

Table:
db $08,$1C,$24,$48

LDX $19
LDA Table,x ; load the table, indexed by power-up ($19)
STA $1490   ; get the different values into $1490.
RTL         ; return, of course.

See how we’ve made a 30ish line code into a simple 4 line one with a table? That’s how awesome the X and Y registers are. So.. what this does, explained once more:

We’re actually loading a table loaded with “star timers” for the current power-up, $19. The first value is for small Mario (#$08), the second for Big Mario (#$1C)… and so on. Now, since there’s only 4 possible power-up states (00, 01, 02, and 03), our table only has to have 4 bytes. That is, 4 values are only needed in this table.

Suppose you have an address with more values though. Imagine there being 8 power-up states in SMW (00-07). If #$04 in $19 would be a value for a hammer power-up, we could extended our table with one more value (byte) long.

Table:
db $08,$1C,$24,$48,$60

As you can see, I added a $60 which would be the value for power-up #$04. If you’re indexing with a RAM Address with more values, you use the amount of values that RAM Address has. Complicated? Look:

$0DBF is the coin counter. How many values do you expect it to have? 100 right? It starts with 00, and goes all the way to 99, and then resets back. If you’re using this for your star timers, your table HAS to be 100 bytes long. Yes, that means it will have a 100 values, tables can be that big. Hell, the level table would be even larger than that

So we load our table into “A”, indexed by “X”. What does this mean? We’re loading a table with values for the RAM Address in the X register. If $19 is in the X register, and we index it with our table (LDA Table,x), it makes the table pick out numbers from values in that RAM Address. Simple, really.

X will contain the RAM Address the table will refer to, so it’s important that this is the right RAM Address.

Y works the same way. Here’s the code in case you want to use the Y register:

Table:
db $08,$1C,$24,$48

LDY $19
LDA Table,y ; load the table, indexed by power-up ($19)
STA $1490 ; get the different values into $1490.
RTL ; return, of course.

Now… some important tips to know:

  • A table’s values are always in hex, not decimal.
  • A value in a table cannot go beyond #$FF (unless you’re in 16-bit mode, will be explained later).
  • A table MUST be set up properly.
  • You can use X or Y, whichever one you wish. Just remember that if you’re already loading something into X, it’s obviously better to use Y instead..
  • Make sure that the table has the right amount of values! If you have a table with values for level numbers, it will be huge, around 100 bytes or so! Make sure each and every value is there! If you’re using sprites, your table should have #$FF values for each sprite!.
  • If you are using the X register, don’t load a table with Y. That’s silly.

What if you want to add or subtract different values? It works in the same way:

Table:
db $01,$02,$03,$04
;00 01 02 03

LDX $19 ; get Mario's power-up .. into X.
LDA $0DBF ; his coins into A.
SEC
SBC Table,x ; subtract different values depending on X (power-up)
STA $0DBF ; and store new value.
RTL ; return.

It may look a bit harder, but really it’s just as simple as LDA Table,x. It makes sense that when subtracting, you need to be indexing the SBC rather than the LDA right?

When adding, you’ll do pretty much the same thing. Replace the SEC SBC Table,x with CLC ADC Table,x instead.

Heh… there’s a lot of other stuff you can do with indexing, such as STA $RAM,x. Now… let’s learn some more about X and Y.

What if you wanted to load RAM $07 through $04. You could do that through indexing as well.

LDX #$03
LDA $04,x ; this will load RAM $07.

You might not find it useful unless you make some sort of a loop. Here’s a code that draws tile #$FC into $0EF9, $0EFA and $0EFB.

    LDX #$02 ; X will contain the value to add.
Loop:
    LDA #$FC ; load tile FC.
    STA $0EF9,x ; note that this will store #$FC to $0EF9 + 2 because it's indexed with X.. so it's actually storing to $0EFB.
    DEX ; but now that we've done DEX, it will make X = #$01.
    CPX #$00 ; if x isn't 00.. loop
    BNE Loop ; this makes it go back to the loop if X isn't 00.
    RTL

It may seem a bit confusing.. but this is what we’re doing:

First, we’re loading 02 into X. Then we load a tile (#$FC) into A, and store it a RAM Address ($0EF9). However, since we’re storing to $0EF9,x we are actually storing it to $0EF9 + 2 which is $0EFB. So instead of storing to $0EF9, we are storing #$FC to $0EFB

But then, we add a DEX, meaning decrease X by one. So X is now 01. Since X isn’t 00 yet, it’ll go back to the loop. What do you realize now? It’s actually storing tile #$FC to $0EF9 + 1 this time! That’s because we added a DEX command.

Finally, it’ll loop until it stores to $0EF9. Once X becomes 00, the code will end. Simple, I guess? And by the way, $0EF9-$0EFB are places on the status bar which you can write to.

So… yeah, there’s a ton of things you can do with indexing, although at times it can be confusing. Just be sure that you know what you’re doing.

Here’s some other nice commands you can use:

TAX -> This copies the RAM Address in A to X.
TAY -> This copies the RAM Address in A to Y
TXY -> This copies the RAM Address in X to Y
TYX -> This copies the RAM Address in Y to X.
TYA -> This copies the RAM Address in Y to A.
TXA -> This copies the RAM Address in X to A.

Should be pretty obvious… (Note: they don’t move, but copy the RAM Address). Anyway,

LDA $010A
TAX

Will copy $010A to X. It’s important to know that it won’t copy the value of $010A into X, but $010A itself. Although you can directly do LDX $010A here, it’s best to use TAX (or TAY) when loading a 24 bit RAM Address into X or Y (6 digit address, $xxxxxx)

Since you can’t directly do:

LDX $long
LDY $long

($long means a RAM Address with 6 digits, like $7F8620 for example).

These commands don’t exist, but LDA $long does. So if you wanted to get a 24-bit address into X, load a 24-bit address into A, and TAX or TAY!

LDA $long
TAX

This will now give you:

LDX $long

Another use could be that you might want to “preserve a RAM Address”. If you want to load something else, you can always TAX it, and TXA when you want it back. Oh.. and there’s a small drawback with the Y register.. you cannot do: LDA $long,y

This command does not exist, although LDA $RAM,x (RAM is a 24-bit address) does. In case you want to do that, TYX is your friend.

Lesson 7: Bitwise Operations

By looking at the lesson name, you’ll might think “oh no, bitwise, what the hell is that it’s going to be hard”. Well, this stuff isn’t really a hard at all. Bits are an important part in assembly, and you can use them for many purposes.

One for example, would be this. Imagine you make 3 power-ups - Dash boots, double jump, and Ice Mario. You do this by setting 3 different values of a RAM Address:

$0DC6
00 - No power-up.
01 - Dash boots
02 - Double jump.
03 - Ice Mario.

Yeah, this doesn’t really exist in SMW, but imagine with all of the current ASM knowledge you have now, you made these power-ups.

Now, there’s a power-up that sets $0DC6 (your power-up RAM) to 01 - the Dash boots. When you collect them, imagine you have it.

Then, there’s power-up 2, the double jump boots (yeah, just imagine it). If you have the dash boots, and then collect the double jump boots, you’ll have the double jump ability, but you lose the dash boots ability. Why? Because $0DC6 is no longer set to 01 but 02 now.. this means that you can’t have your dash ability AND your double jump ability together, unless you merge both codes together. But then, that’ll totally make the power-up ruin the fun out of playing, because the player can easily use 2 power-ups at once.

To overcome this problem, you can use bits. What exactly are bits? We’ll learn them, including binary..

Binary

You might know, that in computers, everything is stored as 0s and 1s. This number system is called binary, it only has 0s and 1s, nothing else.

..yeah, you might be thinking now that what the heck do these 0s and 1s have to do with SMW hacking. We’ll come to that, in a minute.

Anyway, let’s take a random hex value and convert it to binary. #$02 will become 00000010. There’s 8 digits in this binary address. Each of these are called ‘bits’. So, you’ll realize that there a 8 bits in a byte.

In case you don’t know what a byte is yet, it’s a value like #$02. (I’m pretty sure you will by now..)

Yeah, so, 8 bits in a bytes - each consisting of either a 0 or a 1. There’s a fine difference between a 0 and a 1 here:

0 - Bit clear.
1 - Bit set.

But what exactly does clear and set mean? Well, clear means that this bit is empty, and it’s not being used. 01 means that the bit is being used and is not empty.

Think of it as this 0 as the ‘off flag’ and 1 as the ‘on flag’.

So that should be clear by now.. 8 bits in a byte and each bit can either be an 0 or 1, 0 being a clear bit and 1 being a set bit. Now here’s something important to know, the bits are considered in reverse order, for example you might think 00101010. The first bit is the last one, the last bit is the first one. Splitting this up from the beginning, we have:

0 - Bit 8.
0 - Bit 7.
1 - Bit 6.
0 - Bit 5.
1 - Bit 4.
0 - Bit 3.
1 - Bit 2.
0 - Bit 1.

This is very important to know, don’t get confused by it. But how do you make a bit set? And what’s so special if it’s set? Well, let’s carry on.

01 - Bit 1 set.
02 - Bit 2 set.
04 - Bit 3 set.
08 - Bit 4 set.
10 - Bit 5 set.
20 - Bit 6 set.
40 - Bit 7 set.
80 - Bit 8 set.

The first digits are hexadecimal values, and come in a rather similar pattern here:

Bit 1 = 2^0 = 1 = $1
Bit 2 = 2^1 = 2 = $2
Bit 3 = 2^2 = 4 = $4
Bit 4 = 2^3 = 8 = $8
Bit 5 = 2^4 = 16 = $10
Bit 6 = 2^5 = 32 = $20
Bit 7 = 2^6 = 64 = $40
Bit 8 = 2^7 = 128 = $80

Yeah, they use the powers of 2, so that should be simple to understand what each value for each bit is (thanks RPG Hacker!). If you don’t know about powers, then well… you just need to learn some maths :<

If you convert #$01 to binary, you’ll realize that the first bit (final digit) is 1. If you convert 40 to binary, you’ll get a 1 on the 7th bit (second digit). You can add the hexadecimal values to set multiple bits. For example, if you 20 and 10, you’ll get 30. Since 20 sets bit 6 and 10 sets bit 5, the hexadecimal value 30 will set bit 5 and 6.

Another example, let’s add up all of these values together. 80 + 40 + 20 + 10 + 08 + 04 + 02 + 01 adds up to FF. Since all of these bits are set and you’re adding them all up, #$FF will set ALL bits. Yeah, see for yourself. FF becomes: 11111111.

Now I’ll tell you about the purpose of these bits in SMW. Remember the power-up issue I was talking about earlier in this lesson? You want dash boots and double jump boots to work together. An excellent method of doing this would be using bits; basically in your RAM Address, you can set a bit. Here, we can set 2 different bits, one bit for the dash boots and one for the double jump boots.

We can overcome the issue by setting 2 bits in our RAM Address, one for the dash boots and one for the double jump boots. Let’s say we just the first 2 bits, for these two power-ups. When Mario has the dash boots, we should set the first bit. It’ll become like this:

00000001 ; What are doing is "setting this bit" (turning it on) when Mario has the dash boots.

When Mario only has the double jump boots, the RAM Address will become like this:

00000010 ; Set the 2nd bit of our RAM Address.

When he has both and when he has none of them are shown respectively:

00000011 ; both bits are set. This means that Mario has the dash boots and the double jump boots.
00000000 ; no bits are set. This means that Mario doesn't have any of the power-ups.

You might already realize this, but when Mario has the dash boots AND gets the double jump boots, he’ll have both power-ups and won’t lose the one he got first. Why? Because one bit doesn’t affect the other. Unless you use the same bit for both power-ups (which of course, is stupid), you can make Mario retain both abilities without having to worry about one affecting the other. This is really where a bit comes to be useful. Furthermore, you won’t need to waste 2 RAM Addresses if you had the idea of doing that, to save RAM bytes.

…Now binary is pretty much over. Now, suppose you actually want to code a power-up which uses the ‘bit system’. You’d need to know the following things:

  1. How to check if a bit is set or clear.
  2. How to ‘activate’ a bit.
  3. How to clear a bit.
  4. There are no opcodes for doing all of the above for the X and Y registers. You’d need to transfer X or Y to A, and when you’re done with the following, you can transfer them back to X. Remember the TAX TAY TXY etc. opcodes, right?

For the first part, when Mario has the power-up, you’d only execute your power-up’s code when he has the power-up of course. That’s where the opcode AND becomes useful.

AND can be used for two purposes - one is to check if a bit is clear or set, and it can be used to clear a bit.

How does AND work? Pretty straightforward, really. Firstly, we’ll talk about how AND can check bits.

All you need to do is load your address into the ‘A’, and then use the AND opcode followed by the bit you want to check, and then perform a branching command (either BEQ or BNE).

Hard? Not at all. Here’s a code that will check if bit 1 is set in RAM Address $1F2C.

LDA $1F2C  ; first, load our RAM Address. It must be into A, not X or Y!
AND #$01   ; bit 1 is #$01. I am using the AND opcode, followed by the bit to check.
BEQ NotSet ; branch to not set if this bit is not set.
           ; When a bit is set, BEQ will branch. BNE will do the opposite;
           ; BNE here will branch if the bit is not 0, meaning it'll branch when it's 1.

That’s all you need to do when checking if a bit is set. Load into A, then use AND followed by the bit to check for. Then there’s either BEQ and BNE which you have to use:

BEQ branches when the bit is 0.
BNE branches when the bit is 1.

To check if bit 8 was set, what would I have to do? Replace the #$01 with #$80, right?

What if you wanted to branch when 2 bits are set? That’s as simple as adding both hexadecimal values and branching:

LDA $1F2C
AND #$30 ; this will branch if either bit 5 or 6 is set (because they are 10 and 20, so I added them together).
BNE BitsSet

To branch if all bits are clear, I would need to replace the BNE and BEQ (and possibly the label, to avoid confusion) and the AND #$30 with AND #$FF.

Clearing a bit with AND is quite similar. Firstly, instead of using a branching command, you store again to that address. BUT, you must NOT work from 00, but instead from FF. For example, you might think this would clear bit 4:

LDA $1F2C
AND #$08 ; clear bit 4.
STA $1F2C

This is incorrect. Although BIT 4 is 08, subtract it from FF instead:

LDA $1F2C
AND #$F7 ; clear bit 4 (FF - 08 = F7).
STA $1F2C

The reason you subtract from FF is because bits work the other way around. As bit 1 is the last digit, you’ll want to go backwards to reach another bit, not forward.

Again, you can clear multiple bits by subtracting their hexadecimal values together and using AND for that. There’s also an efficient method for clearing bits, but we’ll come to that in a bit.

Now, here’s something you might be interested in knowing in. Checking a bit could also be done with CMP, don’t you think? Well, yes, that’s partly correct, and partly wrong.

A good example for this would be the controller data. The controller data uses bits, as you can see:

7E:0015 1 byte I/O Controller data 1 - 01=Right, 02=Left, 04=Down, 08=Up, 10=Start, 20=Select, 40=Y and X, 80=B and A
Bit 1 = Right
Bit 2 = Left
Bit 3 = Down

etc.. And yes, bit 7 (value #$40) both the Y and X buttons.

So anyway, as I said.. you could use CMP over AND to check for a bit as well. But that can cause some problems. Suppose you want to check if Mario is pressing up. With CMP, you would use:

LDA $15        ; load controller data
CMP #$08       ; compare to value 08..
BEQ PressingUp ; if Mario is pressing up, branch.

This works, but what if you press up and another button? Suppose you’re pressing up and the right button. This code won’t branch then, and will be useless. Can you guess why?

If you haven’t figured, when pressing up, the controller data sets bit 4 and becomes 08. The code right now, will work good if Mario presses up. However, when pressing right, bit 1 will also be set. Remember that I said when 2 bits are set, they’re added up? This means that the controller RAM Address, $15, will no longer be 08, but 09 (since bit 4 and bit 1 are set, meaning 08 + 01). Now you’ll realize, the code won’t work, because the value isn’t 08 anymore… this is where AND is better than CMP.

AND ignores every other bit in a byte, except the one being checked for. If I check for the up button here, I don’t have to worry about the other keys being pressed because AND only concentrates on one bit.

When you press up, bit 4 will be set. This means it becomes 01. So, to check if Mario is pressing up, here’s a better code:

LDA $15     ; load controller data
AND #$08    ; check for bit 4...
BNE Bit3Set ; Also known as Mario is pressing up. This command branches if Mario is pressing up, because the bit will be 1.

Here’s a bad code. I’ll leave it to you to figure out why:

LDA $15
AND #$04
BEQ PressingDown

Tip: Never ever get confused between CMP and AND. AND is way better to use when checking for bits.

Find AND to be a simple opcode? Let’s move onto to our second and third ones - TRB and TSB.

TRB - Used to clear a bit of a RAM Address.
TSB - Used to set a bit of a RAM Address.

Opposites, like BEQ and BNE.

To set a bit, you can do a rather cheap way. Assuming that $0DC6 is 00 at the beginning of this code:

LDA $0DC6
CLC
ADC #$04; this will add 4, and make $0DC6 become #$04. This means that we're setting bit 3.
STA $0DC6

Yes, you can do that. But why not use TSB instead?

LDA #$04  ; load the bit needed to set..
TSB $0DC6 ; and store it to that RAM Address.

This is how TSB works - first you load the bit (or bits, if you want to add 2 bits together), and then TSB it to the RAM Address you want the bit set on. The code above does the exact same thing as the CLC ADC one, but it’s better of course.

Then there’s TRB - clearing a bit. It works in the exact same way as TSB

LDA #$04  ; load the bit needed to clear..
TRB $0DC6 ; and clear it from that RAM Address.

Likewise, you can clear multiple bits by adding both bits together and loading that.

You probably realize that unlike the AND function, you work from 00 instead of FF. This is how TRB and TSB work. For instance:

LDA #$04 ; load the bit needed to clear..
TRB $0DC6 ; and clear it from that RAM Address.

Will become:

LDA $0DC6
AND #$FB
STA $0DC6

It works the other way around.

Here’s an important note to know now - you know like I talked about long addresses, ones with 6 digits right? TSB / TRB won’t work on those addresses. This code fails:

LDA #$40
TSB $7007FF

If you want to clear out a bit with a 24-bit address, you’ll need to use the AND method, I posted before. The usage is:

LDA $ADDRESS
AND #$xx ; bit(s) to clear ; Remember to subtract from FF instead.
STA $ADDRESS

What about setting a bit to a 24-bit address though? TSB won’t work. Time to learn one that works - ORA.

ORA, first of all, can set a bit (or bits) from a RAM Address. The usage is similar to AND:

LDA $ADDRESS
ORA #$xx ; bit(s) to set
STA $ADDRESS

Note that ORA and AND can both set/clear 2/4 digit and 6 digit addresses as well. TSB and TRB are used for those, however, as they can optimize and shorten the code. Another difference is that and AND starts from FF) while ORA, TRB and TSB start from 00.

So ORA can set a RAM Address, but it can also be used for another purpose. It can be used to check if any of a bunch of address are zero or not zero:

LDA $187A ; Yoshi? ... ($187A is the Yoshi flag. If zero, Mario is not on Yoshi).
ORA $73   ; Ducking? ($73 is the ducking flag. Once again, zero means not ducking)
CMP #$00  ; this line can be considered irrelevant.
          ; If you want to compare to zero, you can directly use BEQ.
          ; I just wrote it here so that it's clear to understand.
BEQ EitherOfThemIsZero

The way this works is that we first load a RAM Address normally and can compare it directly.

LDA $187A
BEQ NotOnYoshi ; As I said before, a CMP #$00 is irrelevant. If using BNE, this branches if it's anything other than 0. 

In case you want to check if another address is zero, instead of loading and comparing to 00 again, you could add an ORA in between the LDA $187A and the BEQ NotOnYoshi as I did above. Here’s a code that will make Mario big only if he’s in water or on Yoshi:

    LDA $187A            ; load first address with an LDA.. this one loads the Yoshi flag (00 = not on Yoshi)
    ORA $75              ; water flag. Is Mario in water? 01 = yes.
    BNE OnYoshiOrInWater ; branch if Mario is in water, or on Yoshi. (CMP #$00 = irrelevant.)
    RTL                  ; just end the code if not.
OnYoshiOrInWater:
    LDA #$01             ; the code that
    STA $19              ; makes Mario big.

You can use ORA as many times as you want here, just note that you always load the first one into A (LDA $RAM). So remember, for ORA, the following applies:

BEQ (label) - Branch if any of the addresses are 0.
BNE (label) - Branch if any of the address are NOT 0 (can be anything other than 0.)

So far, you should understand AND, TRB, TSB and ORA. Now, we’ll learn about one more - EOR.

EOR is an opcode that can invert or ‘flip’ bits around. What it does is get the value into A, compares it to something, and applies some rules to get a result. The rules apply in binary, so first convert the hexadecimal value to binary:

For example, let’s EOR #$FF on the value 00.

LDA #$00
EOR #$FF in binary are:
00000000
11111111

Using some rules, we generate a result. The rules are as follows:

1 EOR 0 = 1 - If the first digit is 1, and the second is 0, the result for this digit will be 1.
0 EOR 1 = 1 - If the first digit is 0, and the second is 1, the result for this digit will be 1.
1 EOR 1 = 0 - If both digits are 1,  the result becomes 0. (Similar)
0 EOR 0 = 0 - If both digits are 0, the result becomes 0. (Similar)

So let’s apply these rules with our values:

00000010 ; 02
11111111 ; FF
11111101 ; FD

This concludes that:

LDA #$02
EOR #$FF

Now makes the value in A FD. But, all we’re doing is inverting values? How is that going to help us in SMW hacking? Well, here’s a good example of how it’s used. Mario’s X speed in stored in $7B. When going right:

00 = Not moving.
7F = Maximum.

7F is the maximum speed you can have right, but the fastest you can be in the game is around 28 (at full speed.) But anyway, when going left:

00 = Not moving.
80 = Maximum. (This goes backwards, meaning FF is the slowest going left, and it goes back to 80)

Again, maximum speed when going left is roughly around D7. Now what if you want to code something that increase Mario’s speed? Like dash boots. You could just set a new X speed for Mario - let’s say #$34.

LDA #$34
STA $7B

There’s a problem with this code. As you can see, this will make Mario faster when he’s right, when he’s going left, it’s going to cause problems. Why? because 80 is the maximum speed when can have when going left.

To overcome this problem, we can flip Mario’s X speed around when he goes left, so that it shares the same values as right.

    LDA $76        ; if $76 is = 01.
    BNE GoingRight ; Mario is going right.
    LDA $7B        ; load the X speed.
    EOR #$FF       ; flip it around
    STA $7B        ; and store
GoingRight:
    LDA #$34       ; set the X speed, which also works when going left.
    STA $7B

Let me explain how this works. These are the equivalents for Mario’s speed when going left and right.

RIGHT LEFT
00      00
01      FF
02      FE
03      FD
04      FC
05      FB
...      ...etc.

As you can see, when going right it increases, and when going left it decreases. If I EOR #$FF, I’ll get 01. If I EOR #$FE, I’l get 02. So what I’ve done is gotten Mario’s X speed, and made his “left speed” have the equivalents of his “right speed”. But why EOR #$FF? It’s because Mario’s X speed starts from 0. If I did for example, EOR #$FB instead, it would ignore converting the first few values.

Here’s another example. Here is how you might imagine how pausing works - If Mario is pressing START, it’ll pause. Then it checks if he press START again, and unpauses if he does:

Code:
    LDA $15
    AND #$10    ; if start is being pressed.. (if the bit is set)
    BNE Pause
    RTL
Pause:
    LDA #$01    ; set the pause flag on..
    STA $13D4   ; so the game pauses.
    LDA $15
    AND #$10    ; if Mario is pressing start again..
    BNE UnPause ; then branch to UnPause.
    RTL
UnPause:
    STZ $13D4   ; clear the pause flag again.
    BRA Code    ; repeat our code.

In fact, this is a very inefficient code. By using EOR, we can easily do the above code in a better way:

    LDA $15
    AND #$10   ; check normally if pressing start..
    BEQ Return ; return if not pressing start (bit clear)
    LDA $13D4  ; load the pause flag..
    EOR #$01   ; flip it
    STA $13D4  ; and store the new pause flag status
Return:
    RTL        ; Return. Note that I don't need to add a return after STA $13D4 because there already is one here.

How this works is that it normally checks if you’re pressing START. When doing so, EOR will make the value 00 if it’s 01, and vice versa (01 if it’s 00). EOR is basically “flipping” the pause flag from 0 to 1 when you press start in this code.

Let’s apply the binary formula again. Let’s suppose we’re not pausing (00), and we EOR #$01 to the pause flag.

00000000 ; 0
00000001 ; 1

Result = 00000001; 1

This means that the pause flag will be 1, meaning it’s on. Now imagine the pause flag is on, and we EOR #$01 again.

00000001 ; 1
00000001 ; 1

Result = 00000000 ; 0

This means that the pause flag is off again. So we’ve made that garbage code into a really simple one by effectively using EOR.

That’s all you’ll basically need when working with bits. There are other bitwise operations, such as BIT, but those ones you won’t find very useful when working with code, so you should just learn these ones instead. They are pretty handy at times.

Let’s move onto Lesson 8, The Stack

Lesson 8: The Stack - Preserving values and RAM

The Stack is really useful when you want to “preserve” your values or addresses, and then “pull” them later. Let’s give an example:

You have a stack of 9 books, one book on top of the other:

-- Book 1
-- Book 2
-- .........
-- Book 9.

Now I want to add 10 more books here:

-- Book 10
-- Book 11
-- ...........
-- Book 19.

Okay, so there’s 15 books on our shelf. The problem: there’s a pile of 19 books there, and I want to take out the 11th one. It’s obviously going to be hard as I’m going to have to count to 11 and then take out that book, isn’t it?

A better way would be to “preserve” book 11 instead. With preserving it, just imagine that it’s kept in another area temporarily, so that I can easily take the book when I want to.

The Stack works the same way - it’s like a bookshelf. If you want to preserve your values and load them back later (pull them back), you can easily so, as many times as you want to.

To work with pushing/pulling, you load your value or address, and them pull it back when needed. You can push with the A, X, and Y register.

  • The A register will preserve a value/RAM Address in A.

  • The X register will preserve a value/RAM Address in X.

  • The Y regster will preserve a value/RAM Address in Y.

  • PHA and PLA are used to preserve and pull a value in A. PHA pushes it, and PLA pulls it back.

For X and Y, PHX / PLX and PHY / PLY are used.

Here’s another thing to note. Even with you load A with something, and then preserve it, it’s still in the accumulator! That means you can still use it for stuff, like comparing, without having to worry if it got preserved. The following example will show you what I meant, with pushing a RAM Address into the accumulator.

LDA $0660 ; Load some RAM Address (Imagine this is some custom address which we're using).
PHA ; preserve it for later.
LDA #10
STA $0DBE ; do some other stuff. (Guess what this does )
PLA
STA $0DBF ; .. store whatever we preserved into $0DBF ($0660 was preserved, now the value in that will be in $0DBF)

Of course, we could have easily done this:

LDA #10 ; .. make the lives counter
STA $0DBE ; become 10. (We could also use #$0A instead of #10).
LDA $0660 ; .. store $0660
STA $0DBF ; .. to $0DBF
          ; This means that the value in $0660 will now be in $0DBF, the lives counter.
          ; If it was 10, we'll be storing 10 into $0DBF)

That code is even shorter! So what was the point of preserving there?

Well, that was just a lame example to show how preserving works. Don’t try to preserve it in useless cases, like the example above because it will just waste more space (bytes) in your code and does the exact same thing. I’ll give you a good example later.

Now, here’s a really important thing to remember. Let’s imagine the book example I stated before.

You preserved book 11 elsewhere right? Let’s say book 13 is also important. So I put that one in our “reserve area” too. The stack works by putting stuff on top of each other. So book 13 will be right above book 11. Let’s imagine I’ve pushed values 11 and 13:

LDA #11
PHA ; We pushed 11 first, so that comes before value 13.
LDA #13
PHA

-- Book 13
-- Book 11

For a small while, I need to take out book 11. If I take out the first book, I’ll take out (PLA) the top one, right? But that’s book 13, not the one I want. So I’ll need to pull (PLA) again to get book 11. What do you notice?

As a general rule, the number of times you push must equal to the number of times you pull to load the proper value you want.

What if, suppose, I pull again? There’s nothing in the stack anymore, so if I pull, it won’t get any book? Well, what happens worse in SMW coding is that your game will crash! Make sure that the pushes and pulls are equally balanced, and at the end of your code, nothing is on the stack anymore!

  • Push once, and you must pull only once.
  • Push 5 times, and you must pull twice times.
  • Push 30 times, and you must pull thirty times. (Of course, no one needs to push that much )

What about in comparing though? Here’s an example:

    LDA $0DBF
    PHA
    CMP #$05 ; compare $0DBF to 05 (remember, even though I pushed it, it's still loaded into the accumulator)
    BCC Return ; return if I don't have 5 coins.
    PLA
    STA $0EF9 ; store $0DBF to the ($0EF9 is an area on the status bar, to be specific, the 'M' on Mario.
    RTL
Return:
    PLA ; .. pull back and return directly.
    RTL

Note that there’s 2 pulls, but only one push. Can you guess why? At first, I always push $0DBF at the beginning of code. Then I compare it to 05. If it isn’t 05, I’ll just directly end my code. However, $0DBF is still on the stack. Remember that I said before ending your code, nothing must be on the stack? That’s why I pull this one separately and end my code.

If $0DBF is 5 or more, it’s still on the stack. So I pull it back and store it to $0EF9. I must put an RTL after the STA $0EF9 because if I didn’t, it would pull again (The return label has another PLA) and thus crash the game. Remember this. Usually when comparing, the number of pulls and pushes can vary, but this shouldn’t happen in the actual coding. Make sure that the pushes and pulls in your code are correct.

Here’s a real example, where preserving is almost always used:

A common use is in sprites. Sprites use the X Register for tables/indexing (I know we haven’t yet covered how to make sprites, but just note that when making them, they use the X Register to work). This means that if you’re loading/storing/whatever with the X register which does NOT modify the sprite tables (they’re just tables and stuff used with indexing in sprites), then your sprite won’t function. Alternatively, you could use the Y register, right? But what if you’re already using that too?

A better idea would be to preserve the X register. We can preserve the sprite tables, and pull them back after we’re done with our work with the X register. Just assume this in a sprite code (note: don’t try to understand it, because it’ll be confusing, so just look at the code):

PHX         ; Push the X register
LDX $19
LDA Table,x ; Imagine I'm using some sort of table.
            ; This table doesn't have anything to do with the sprite main tables, it's for my own code.
STA $0660   ; Store it somewhere.
PLX
RTL         ; Pull back X and terminate.

As I said, imagine that this code is in a sprite. The sprite tables ALWAYS use the X register while a sprite is running, so if I modify X in someway with preserving it, the sprite probably won’t appear on the screen and function properly.

So what are the sprite tables anyway? Well, they’re used for various purposes, such as as the sprite’s X/Y speed, the number of times to go through a loop, etc. You’ll have a better understand of this when we reach the sprite-creating tutorial later.

Now remember, I can’t really give good examples of pushing/pulling (aside from the sprite one), but normally you can easily tell yourself when it’s appropriate to push and pull a value or address. However, do note that when working with custom codes, it’s always a good idea to preserve the contents of the accumulator RIGHT before your code starts, and pull it back at the very end. This sometimes is useful when hacking a routine. (We’ll come to that later.)

Don’t try doing something stupid though, like this:

LDX $0DBF ; load $0DBF into X.
PHA       ; Push A.
LDA $0DBE
STA $0EF9 ; store $0DBE over here.
PLX
RTL

Can you guess what’s wrong? Firstly after loading $0DBF into X, I preserve A, the contents of the accumulator. I didn’t load anything into A, but whatever is in there is still preserved so there is wrong with that.

Then I just do some random other code; store $0DBE into $0EF9. Again, there’s nothing wrong with that.

Thirdly, I pull back X and return. Here’s the problem: I actually pushed A, not X! So I’m pulling something from X which didn’t even get pushed, and I’m also pushing A without even pulling back a value. I should just changed the PLX to PLA!

Here’s another short sample code before I move on to something about JSR / JSLs and the stack (don’t worry, it’s nothing too difficult):

LDA #$04
PHA
LDA #$06
PHA
LDA #$08
STA $0EFA
PLA
STA $0EFB
PLA
STA $0EFC

Can you guess which gets stored where? This will help us:

06 <- I pushed this one after 04, so it goes on top of it.
04 <- I pushed this one first, so it goes at the bottom.

PLA, get the first value back (the top one) and stores it into $0EFB. The second value, in then loaded into $0EFC.

06 -> $0EFB.
04 -> $0EFC.

(Edit: The 06 will be pulled off the stack first, since it on the top. This originally said 04 -> 0EFB, and 06 -> 0EFC, but based on the code it refers to, 06 should be in $0EFB)

08 -> $0EFA (08 was never in the stack. Stored immediately after loading )
06 -> $OEFB (06 was pushed to stack last. Pulled off first because it was on the top.)
04 -> $0EFC (04 was pushed to stack first. Pulled off last because it was on bottom.)

So the last value you preserve is always loaded when you do your first pull. Okay, let’s learn something quite important. This code will fail:

    PHA
    JSR Code
    LDA #$04
    STA $0EF9
    RTL
Code:
    PLA
    RTS

Nothing looks wrong in it though, right? There’s the correct number of pushes/pulls (01), and the subroutine ends with an RTS like it’s supposed to do (JSR -> RTS).

The problem here, is the JSR (a JSL will also cause a problem here, so imagine it’s for both of them.) ..Wait, how is the JSR causing the code to crash?

These commands use the Stack to store what part of the code to jump back to. So, if you Push to the Stack, then JSR, then Pull again before the JSR code is finished, well, your code will make itself into some delicious scrambled eggs.

The JSL and JSR commands store what part of the code you jump to (in this case, Code subroutine) to the stack. It stores 2 bytes (imagine they’re like values..) to the stack, like this:

-- Byte 1
-- -- Byte 2

The JSL and JSR commands will also use these to jump back to the main code.

In my code, I pull as soon as the subroutine starts:

-- Byte 3 <- The value (or address) we preserved.
-- Byte 2 ;\
-- Byte 1 ;/ These 2 are for the JSR.

Now, after the PLA, there’s an RTS (but we can also imagine there’s some other code before the RTS). So at the end of the subroutine, the JSR will use the first two bytes on the stack (the first top two) to return to the main code again. Now can you guess what’s wrong here?

The JSR is supposed to use byte 1 and byte 2, but it’s actually using byte 3 and byte 2! Byte 3 was for our code and the JSR is using it! This means your game will crash because the JSR doesn’t know where to return to!

Always keep that in mind when using pushing/pulling commands (also applies for X and Y of course, as they also share the stack with A) and the JSL / JSR commands. You’ll need to keep this in mind when using them, because you don’t want to your game to crash!

A good code for the above example could be:

    PHA
    JSR Code
    LDA #$04 ;\
    STA $0EF9 ;/ Random code. I am obsessed with $0EF9-$0EFC, aren't I? 
    PLA
    RTL
Code:
    RTS

It’s always the best idea to push your value(s), and then pull it (them) back after the subroutine finishes. For example:

    PHX
    JSR Code
    PLX
    RTL
Code:
    RTS ; You can put whatever code you want here.

Well, that’s all you need to know for the stack. These commands use them:

PHA - Push A. (It can push a value, or an address)
PHX - Push X. (It can push a value, or an address)
PHY - Push Y. (It can push a value, or an address)
PLA - Pull A. (It can pull a value, or an address)
PLX - Pull X. (It can pull a value, or an address)
PLY - Pull Y. (It can pull a value, or an address)
JSL - This pushes 2 bytes to the stack and remember the code to jump/return to. It pushes 2 bytes on the stack.
RTL - This uses the stack in the sense that it must be in a subroutine which is accessed through a JSL.
JSR - This pushes 3 bytes to the stack and remembers the code to jump/return to.
RTS - We covered this in lesson 4 as well. This must be used in a subroutine accessed through a JSR.

So a JSL / JSR work the same way, as I said in lesson 4, but they use RTL and RTS respectively.

Now, you can tell why: they push different bytes to and pull different bytes from the stack, and if you mix a JSR with RTL or JSL with RTS, you’ll get the wrong number of bytes and thus, the code fails.

Of course, the stack is also used elsewhere for more complexed stuff, but we won’t be going through them in this tutorial as they are obviously advanced and not really needed.

ASM Tutorial Part 1: Lost? Start from Part 1.