DMA, or "direct memory access" is found in a number of computer systems, not just the Super Nintendo. It’s basically a way for a peripheral or coprocessor to read data directly from memory, instead of requiring the main CPU to do a number of reads and writes. This is typically faster, if only because it lets the system skip the opcode fetch-and-decode. In the SNES, the CPU is paused during DMA since the address busses are in use for the transfer.
HDMA is similar in concept, though rather different in execution: instead of transferring a block of memory all at once, it transfers a few bytes during the H-Blank period of each scanline. This is extremely helpful, as most PPU registers may only be changed during a frame (at least without glitching) during this narrow window.
The SNES has 8 channels (numbered 0-7) that can be used for either DMA or HDMA. HDMA takes priority over DMA if both are to occur at once, pausing all DMA and terminating a conflicting DMA immediately. Lower-numbered channels take priority over higher-numbered channels.
A DMA transfer has three main variables, and a number of setting bits. These are: (those marked ’’ must be set up before starting DMA)
See register $43x0 for the correspondence between the Mode bits and the transfer mode. Note that One Register Write Once and One Register Write Twice end up being the exact same thing, and Two Registers Write Once and Two Registers Write Twice Alternate are the same, but that Two Registers Write Once and Two Registers Write Twice Each are different.
DMA transfers take 8 master cycles per byte transferred, no matter the FastROM setting. There is also an overhead of 8 master cycles per channel, and an overhead of 12-24 cycles for the whole transfer.
The basic process seems to be:
A. Get byte and write it to the destination.
B. Adjust AAddress.
C. Decrement Count.
Note that Count ($43x5-6) ends up always 0, unless a conflicting HDMA terminates the transfer early.
HDMA has 4 flags and 5 variables. Again, those marked ‘√’ are required before starting HDMA. In addition, those marked ’+’ are required if HDMA is to be started mid-frame.
Modes are the same as for DMA. However, note that only one cycle through the mode is done per scanline, so One Register Write Once will write 1 byte per scanline, while One Register Write Twice will write two.
For each scanline during which HDMA is active (i.e. at least one channel is not paused and has not terminated yet for the frame), there are ~18 master cycles overhead. Each active channel incurs another 8 master cycles overhead (during which time $42xA is presumably loaded if necessary) for every scanline, whether or not a transfer actually occurs. If a new indirect address is required, 16 master cycles are taken to load it. Then 8 cycles per byte transferred are used. Thus, HDMA takes a maximum of 466 master cycles per scanline (if all 8 channels are active, require an indirect address load, and transfer 4 bytes).
The basic process has two sections. First, at the beginning of the frame (V=0 H=approx 6), for all active HDMA channels (see register $420c):
The CPU is paused during this time. Overhead is ~18 master cycles, plus 8 master cycles for each channel set for direct HDMA and 24 master cycles for each channel set for indirect HDMA.
If you are starting HDMA mid-frame, you must basically do the init process manually by setting $43x8-A, and $43x5-6 for indirect channels. Note though that there is no way to perform step 4, so no transfer will be done the first transfer period. Also, note that a channel that has already terminated for the frame cannot be restarted. XXX: Or does it automatically do Step 4 when you enable the channel?
Then, for each scanline from V=0 to V=$e0 (or V=$ef is overscan is enabled) at about H=$116:
1. If Do Transfer is false, skip to step 3.
2. For the number of bytes (1, 2, or 4) required for this Transfer Mode…
3. Decrement $43xA.
4. Set Do Transfer to the value of Repeat.
5. If Line Counter is zero…
6. Continue with Step 1 next scanline.
HDMA does not occur during V-Blank, as any writes it might perform are likely have no visible effect anyway. The start-of-frame processing then resets all active channels at the end of V-Blank. This allows updating of the HDMA registers during V-Blank without worrying about the transfer beginning immediately and scribbling on the PPU state.
Note how the above implicitly defines the format of the HDMA table. Explicitly, the format is a series of entries. Each entry begins with a line count and repeat flag. If repeat is false, there is one scanline worth of data following and the count is the number of scanlines to wait before processing the next entry. If it’s true, the line count is the number of scanlines worth of data following. The data following is either a pointer to the data (for Indirect HDMA), or the data itself (for Direct HDMA).
Looking at the above, it’s clear why Address, and Repeat/Line Counter must be initialized by hand when starting HDMA mid-frame: they’re only automatically initialized at the start of the frame. Note how AAddress is not affected by HDMA, though Address and Repeat/Line Counter are.