DMA, or "direct memory access" is found in a number of computer systems, not just the Super Nintendo. It's basically a way for a peripheral or coprocessor to read data directly from memory, instead of requiring the main CPU to do a number of reads and writes. This is typically faster, if only because it lets the system skip the opcode fetch-and-decode. In the SNES, the CPU is paused during DMA since the address busses are in use for the transfer.

HDMA is similar in concept, though rather different in execution: instead of transferring a block of memory all at once, it transfers a few bytes during the H-Blank period of each scanline. This is extremely helpful, as most PPU registers may only be changed during a frame (at least without glitching) during this narrow window.

The SNES has 8 channels (numbered 0-7) that can be used for either DMA or HDMA. HDMA takes priority over DMA if both are to occur at once, pausing all DMA and terminating a conflicting DMA immediately. Lower-numbered channels take priority over higher-numbered channels.

DMA

A DMA transfer has three main variables, and a number of setting bits. These are: (those marked '*' must be set up before starting DMA)

  • Direction (bit 7 of $43x0): Read from PPU or write to PPU?
  • Fixed (bit 3 of $43x0): Adjust Address?
  • Increment (bit 4 of $43x0): Direction to adjust Address?
  • Mode (bits 0-2 of $43x0): See below...
  • Port (register $43x1): If this is 'xx', the register accessed will be $21xx.
  • AAddress (registers $43x2-4): Any CPU address (A-bus,) just like you'd use with the Absolute Long addressing mode.
  • Count (registers $43x5-6): The number of bytes to transfer.
Transfer values
Address What it needs
$43x1 The address on the B-Bus stored in 8 bits. Holds the xx in $21xx.
$43x2 - $43x4 The long, absolute value of location on the A-Bus.
$43x5 - $43x6 The number of bytes (0 - 65535, 0 is weirdly treated as 65536 though)
The transfer flags
Bit(s) What it does What it needs
7 Transfer Direction 0:A to B-Bus, 1:B to A-Bus
4 - 3 Auto update pointer on A-Bus 00:Increment, 10:Decrement, x1:Do nothing
0 - 2 Unit size & Format 000:1 byte to 1 register (write once), 001:2 bytes to 2 registers (write once), 010:2 bytes to 1 register (write twice), 011:4 bytes to 2 registers (write twice), 100:4 bytes to 4 registers (write once)

See register $43x0 for the correspondence between the Mode bits and the transfer mode. Note that One Register Write Once and One Register Write Twice end up being the exact same thing, and Two Registers Write Once and Two Registers Write Twice Alternate are the same, but that Two Registers Write Once and Two Registers Write Twice Each are different.

DMA transfers take 8 master cycles per byte transferred, no matter the FastROM setting. There is also an overhead of 8 master cycles per channel, and an overhead of 12-24 cycles for the whole transfer.

The basic process seems to be:

A. Get byte and write it to the destination.

  • The DMA seems to take advantage of the SNES's two address busses with one shared data bus. AAddress is pushed out Bus A, Port is pushed out bus B, and the read/write signals are sent according to Direction. The bus marked read obligingly put data on the bus, while the bus marked write obligingly writes that value.
  • Thus, since the PPU/APU/WRAM registers are only accessible via Bus B, attempts to access them via AAddress will result in Open Bus accesses.
  • Attempts to access WRAM via both Bus A and Bus B (registers 2180-3) will fail, with the 2180-3 access being Open Bussed.
  • Also, DMA cannot access the $4300-$437f registers nor $420b nor $420c. Writes will have no effect, and reads will return Open Bus.

B. Adjust AAddress.

  • If Fixed is set, do nothing. Else if Increment is set, subtract one, else add one.
  • Note that the bank byte is not modified.

C. Decrement Count.

  • If count is not zero, then go to step 1.
  • Thus, if Count is initially zero, it wraps to 65535 before being tested. So you end up transferring 65536 bytes.

Note that Count ($43x5-6) ends up always 0, unless a conflicting HDMA terminates the transfer early.

HDMA

HDMA has 4 flags and 5 variables. Again, those marked '√' are required before starting HDMA. In addition, those marked '+' are required if HDMA is to be started mid-frame.

  • √ Addressing Mode (bit 6 of $43x0): If clear, Direct, else Indirect.
  • √ Transfer Mode (bits 0-2 of $43x0): See below...
  • √ Port ($43x1): As for DMA.
  • √ AAddress ($43x2-4): Pointer to the HDMA Table. Not really 'required' for starting mid-frame, but unless you're going to stop it before the next init...
  • [-] Indirect Address ($43x5-6): Used with Indirect Bank. See below...
  • √ Indirect Bank ($43x7): Used with Indirect Address. See below...
  • [+] Address ($43x8-9): See below...
  • [+] Repeat (bit 7 of $43xA): Whether to write every scanline or not
  • [+] Line Counter (bits 0-6 of $43xA): See below...
  • [-] Do Transfer: Used internally.

Modes are the same as for DMA. However, note that only one cycle through the mode is done per scanline, so One Register Write Once will write 1 byte per scanline, while One Register Write Twice will write two.

For each scanline during which HDMA is active (i.e. at least one channel is not paused and has not terminated yet for the frame), there are ~18 master cycles overhead. Each active channel incurs another 8 master cycles overhead (during which time $42xA is presumably loaded if necessary) for every scanline, whether or not a transfer actually occurs. If a new indirect address is required, 16 master cycles are taken to load it. Then 8 cycles per byte transferred are used. Thus, HDMA takes a maximum of 466 master cycles per scanline (if all 8 channels are active, require an indirect address load, and transfer 4 bytes).

The basic process has two sections. First, at the beginning of the frame (V=0 H=approx 6), for all active HDMA channels (see register $420c):

  1. Copy AAddress into Address.
  2. Load $43xA (Line Counter and Repeat) from the table. I believe $00 will terminate this channel immediately.
  3. Load Indirect Address, if necessary.
  4. Set Do Transfer to true.

The CPU is paused during this time. Overhead is ~18 master cycles, plus 8 master cycles for each channel set for direct HDMA and 24 master cycles for each channel set for indirect HDMA.

If you are starting HDMA mid-frame, you must basically do the init process manually by setting $43x8-A, and $43x5-6 for indirect channels. Note though that there is no way to perform step 4, so no transfer will be done the first transfer period. Also, note that a channel that has already terminated for the frame cannot be restarted. XXX: Or does it automatically do Step 4 when you enable the channel?

Then, for each scanline from V=0 to V=$e0 (or V=$ef is overscan is enabled) at about H=$116:

1. If Do Transfer is false, skip to step 3.

2. For the number of bytes (1, 2, or 4) required for this Transfer Mode...

  • Read a byte from Address or Indirect Address, and increment.
  • Write the byte to Port, Port+1, Port+2, or Port+3, depending on the Transfer Mode and which byte we're on. The same notes regarding DMA from PPU to PPU or RAM to RAM via $2180 apply here as well.

3. Decrement $43xA.

4. Set Do Transfer to the value of Repeat.

5. If Line Counter is zero...

  • Read the next byte from Address into $43xA (thus, into both Line Counter and Repeat).
  • If Addressing Mode is Indirect, read two bytes from Address into Indirect Address (and increment Address by two bytes). One oddity: if $43xA is 0 and this is the last active HDMA channel for this scanline, only load one byte for Address, and use the $00 for the low byte. So Address ends up incremented one less than otherwise expected, and one less CPU Cycle is used.
  • If $43xA is zero, terminate this HDMA channel for this frame. The bit in $420c is not cleared, though, so it may be automatically restarted next frame.
  • Set Do Transfer to true.

6. Continue with Step 1 next scanline.

HDMA does not occur during V-Blank, as any writes it might perform are likely have no visible effect anyway. The start-of-frame processing then resets all active channels at the end of V-Blank. This allows updating of the HDMA registers during V-Blank without worrying about the transfer beginning immediately and scribbling on the PPU state.

Note how the above implicitly defines the format of the HDMA table. Explicitly, the format is a series of entries. Each entry begins with a line count and repeat flag. If repeat is false, there is one scanline worth of data following and the count is the number of scanlines to wait before processing the next entry. If it's true, the line count is the number of scanlines worth of data following. The data following is either a pointer to the data (for Indirect HDMA), or the data itself (for Direct HDMA).

Looking at the above, it's clear why Address, and Repeat/Line Counter must be initialized by hand when starting HDMA mid-frame: they're only automatically initialized at the start of the frame. Note how AAddress is not affected by HDMA, though Address and Repeat/Line Counter are.