SEGA Genesis: Framebuffer Rendering

Mandelbrot on SEGA Genesis using framebuffer

As discussed in earlier articles, the SEGA Genesis renders graphics using a combination of tiles and sprites, and does not directly support framebuffer rendering. However, the Genesis Video Display Processor (VDP) can be tricked into doing this by setting up a fixed playfield character mapping and rendering directly to character data. In this article, different types of graphics will be discussed, and sample framebuffer implementations on the Genesis will be shown.

Terms

The following terms are used in the following, it's probably good to define them here:

Bitmap: Spatially mapped pixel array, a sequence of pixels ordered in memory in a similar fashion to how they should be displayed.
Framebuffer: Memory containing a bitmap, driving a video display

Types of Rendering

To appreciate the difference between framebuffer and tile-based rendering, it is useful to look at a few foundational types of computer graphics rendering used by video games throughout the years.

Vector Graphics

'Spacewar!' (PDP-1 1962), vector graphics

The very first games such as Steve Russel's Spacewar! (1962) controlled the electron beam of their cathode ray tube (CRT) displays directly, drawing lines of varying light intensity between points. This form of graphics is known as vector graphics.

Atari would create particularly famous examples of vector graphics rendering, including 'Lunar Lander' and 'Asteroids' from 1979, first-person shooter 'Battlezone' from 1980, and 'Star Wars' from 1983. In 1982, Los Angeles-based Smith Engineering released the Vectrex, a home console with a built-in vector display.

Vector displays have become rare, but it is highly recommended to try out original arcade games like Asteroids to see this type of display, the contrast and smoothness can look very different from a normal CRT display.

Raster Bar Graphics

'Breakout' (Arcade 1976), raster bar graphics

In home TV screens, the electron beam would automatically scan the screen left-to-right, top-to-bottom, 50 or 60 times per second in a horizontal line pattern. This fixed pattern is called 'raster scanning'. Atari's 'Pong' (1972) was designed to be compatible with this type of TV display, and instead of controlling the electron beam's position freely, the beam output intensity would be controlled as a function of the x- and y-position of the electron beam, which would create colored areas on the screen. This approach normally results in blocky graphics with right angles. We might call this 'Raster Bar Graphics'.

The approach of changing the beam color on the fly would be used extensively on the Atari 2600 home console, where incredibly simple graphics hardware forced developers to make very creative choices. Read Nick Montforts and Ian Bogost's 'Racing the Beam' for great stories about the challenges of Atari 2600 development.

Tiles and Sprites

'Pac-Man' (Arcade 1980), tile- and sprite-based graphics

Graphics in many video games from the 1980s and early 1990s with was implemented using a combination of tiles, tilemaps, and sprites. Tiles and sprites would defined using bitmaps, spatially mapped arrays of pixel color data, and tilemaps would map tiles to display positions in a fixed grid on the screen. Video controllers would take this representation of tiles and sprites and display it directly on the screen. We will refer to this type of graphics as tile-based graphics.

The benefit of tile-based graphics is lower memory usage, something absolutely critical during this era, as memory chips were prohibitively expensive. With tiles and tilemaps, full screens of graphics can be represented using a small set of tiles, repeated in the tilemap to form larger graphics. Essentially, it is a form of manual image compression.

A classic example of tile-based graphics is Namco's Pac-Man (1980), where the maze and dots are displayed using tiles. This rendering type would be ubiquitous in arcade games in the 1980s and 1990s, and several home consoles would be designed this way, including Nintendo's NES (1983) and of course, the SEGA Genesis (1988).

Framebuffer Rendering

'Virus' (Amiga 1988), software 3D filled polygon rendering

'Wolfenstein 3D' (DOS 1992), raycasting 3D rendering

Computers from this era would also have tiles and sprites, but would often require more flexibility to render application GUIs, graphs, word processing, and the like.

This would be accomplished by rendering graphics in software to a bitmap in memory, called a framebuffer, and then displayed. This approach required more memory, but enabled the ultimate rendering flexibility. First off, previously mentioned rendering techniques could be supported: vector graphics could be emulated in software and rendered to a framebuffer, and, tiles and sprites could be supported in hardware and rendered on top of a framebuffer, and raster bar graphics could be emulated by modifying the color palette during scanline rendering.

Besides emulating existing techniques, framebuffer rendering enabled new techniques such as filled polygon rendering or raycasting 3D graphics.

SEGA Genesis Rendering

'Zombies Ate My Neighbors' (1993), tile-based graphics

The SEGA Genesis does not have a hardware framebuffer, and rendering to a bitmap in RAM is not immediately supported. Most games use the VDP's built-in tile-based rendering with an overlay of sprites. However, if we want to do framebuffer rendering, an approach could be setting up a fixed playfield map where every cell on the screen is mapped to a unique character, and rendering individual pixels to characters in VRAM, which is then automatically displayed by the VDP.

This would enable types of graphics that are normally impossible on the Genesis, such as 3D polygons.

This approach should work, but the layout of characters in VRAM is not the most intuitive for raster graphics. We can simplify rendering by introducing a RAM framebuffer with a simple layout. This separates the rendering itself from mapping pixels to the layout of characters in VRAM. We can render pixels to the framebuffer, and then copy the framebuffer to character data in VRAM with the appropriate translation of memory layout.

First, we should do an analysis whether this approach is even possible on the Genesis hardware.

Is Framebuffer Rendering Possible on Genesis?

Linear bitmap framebuffer test pattern - SEGA Genesis

So, how much does a fullscreen framebuffer take up in memory? Can we fit a framebuffer into 64 KB RAM?

The Genesis has two basic display modes, 32 cell and 40 cell mode (256 and 320 pixels wide, respectively). In this example, we'll use 32 cell mode (256 pixels). In this mode, we have:

32 x 28 = 896 chars

How much memory is does a single character take up?

- Each char is 8x8 pixels
- Each pixel is 4 bits of color (0-15)
- Each char:

  8 * 8 * 4 bits/char = 256 bits/char / 8 bits/B = 32 B/char

In total, the 32 cell framebuffer needs

896 chars * 32 B/char = 28672 B = 28 KB ($7000 B)

So, to answer our question: Yes, we can not only fit 1, but 2 full 28 KB framebuffers into 64 KB RAM.

The RAM framebuffer could have a multitude of layouts. As long as the procedure for copying to VRAM is simple enough to be reasonably fast, many options exist.

Let's consider two basic layout choices for the framebuffer: linear bitmap and character bitmap. In the linear bitmap, pixels are ordered sequentially in memory. In the character bitmap, memory is organized into 8x8 characters.

Linear Bitmap Framebuffer

Color cycling a generated framebuffer

The linear bitmap framebuffer has the same layout as the screen. Rendering code will be as simple as it gets.

Simple rendering: Framebuffer is layed out as a matrix of display pixels, each pixel is 4 bits of color.
Non-trivial VRAM copy: Copying to VRAM is simple, but slow.

Example linear 256x224 pixel bitmap - x and y values are pixel positions, the numbers in the table denote memory byte offsets, every 8-bit byte represents 2 4-bit color palette indices)

x/y   0   2   4   6     8   10  12  14         248 250 252 254
     ---------------------------------------------------------
 0 |   0   1   2   3 |   4   5   6   7 | ... |  7C  7D  7E  7F
 1 |  80  81  82  83 |  84  85  86  87 | ... |  FC  FD  FE  FF
 2 | 100 101 102 103 | 104 105 106 107 | ... | 17C 17D 17E 17F
 3 | 180 181 182 183 | 184 185 186 187 | ... | 1FC 1FD 1FE 1FF
 4 | 200 201 202 203 | 204 205 206 207 | ... | 27C 27D 27E 27F
 5 | 280 281 282 283 | 284 285 286 287 | ... | 2FC 2FD 2FE 2FF
 6 | 300 301 302 303 | 304 305 306 307 | ... | 37C 37D 37E 37F
 7 | 380 381 382 383 | 384 385 386 387 | ... | 3FC 3FD 3FE 3FF
   | ----------------------------------  ...  ----------------
 8 | 400 401 402 403 | 404 405 406 407 | ... | 47C 47D 47E 47F
 9 | 480 481 482 483 | 484 485 486 487 | ... | 4FC 4FD 4FE 4FF
     :
     etc.

Example code for copying a linear framebuffer from RAM to VRAM. It reads 8x8 squares of pixels from the framebuffer and copies it to character data in VRAM, one character at a time. Full source code on GitHub:

vdp_control = $C00004
vdp_data    = $C00000

; ----------------------------------------------------------------------------
; copy_framebuf( framebuf_addr (A0.L), VRAM_char_memory (D0.w) )
; ----------------------------------------------------------------------------
; Copies linear framebuffer from framebuf_addr in RAM (e.g. $E00000) to
; VRAM_char_memory in VRAM (e.g. $0000)
;
; A char is 8 pixels wide, 4 bits/pixel = 32 bits = 4 bytes
;
; RAM base + $000        $004         $008            $09C
;            Char0Line0  Char1Line0   Char2Line0  ... Char31Line0
; RAM base + $080        $084                         $0FC
;            Char0Line1  Char1Line1   Char2Line1  ... Char31Line1
; RAM base + $100        $104                         $1FC
;            Char0Line2  Char1Line2   Char2Line2  ... Char31Line2

copy_framebuf
    movem.l d0-d7/a0-a6,-(sp) ; push all registers to stack

    move.w  sr,-(sp)           ; save status register on stack
    move    #$2700,sr          ; disable interrupts
    move.w  #$8F02,vdp_control ; Set VDP autoincrement to 2 words/write

    ; setup_vram_write( addr (D0.w) )
    jsr     setup_vram_write

    move.w  #28-1,d5           ; d5 : char row idx

write_char_row

    move.w  #32-1,d6           ; d6 : char idx

write_char 
    move.w  #8-1,d7
    move.l  a0,a1

.write_line
    move.l  (a1),d0            ; d0 = line pixels
    move.l  d0,vdp_data        ; write line to VRAM
    add.l   #$80,a1            ; next line
    dbra    d7,.write_line

.next_char
    add.l   #4,a0
    dbra    d6,write_char

    add.l   #$380,a0
    dbra    d5,write_char_row

    move.w  (sp)+,sr           ; restore status register
    movem.l (sp)+,d0-d7/a0-a6  ; pop all registers from stack

    rts

; ----------------------------------------------------------------------------
; setup_vram_write( addr (D0.w) )
; ----------------------------------------------------------------------------
setup_vram_write
    movem.l   d0/d5-d6,-(sp)

;    Bits    [BBAA AAAA AAAA AAAA 0000 0000 BBBB 00AA]
;    B order [10.. .... .... .... .... .... 5432 ....] oper. type
;    A order [..DC BA98 7654 3210 .... .... .... ..FE] address
;    -------------------------------------------------------------
;    VRAM Write (00 0001) to addr $0000 (0000 0000 0000 0000):
;            [01.. .... .... .... .... .... 0000 ....] oper. type
;            [0100 0000 0000 0000 .... .... 0000 ..00] VRAM address
;            [0100 0000 0000 0000 0000 0000 0000 0000] add zeroes
;      Hex:      4    0    0    0    0    0    0    0

; > A is destination address, in this order:
; [..DC BA98 7654 3210 .... .... .... ..FE]

    ; Note: d0 is a *word*, may have garbage in upper 16 bit

    move.l  d0,d5                 ; d5 = (???? ???? ???? ???? FEDC BA98 7654 3210)

    and.l   #%1100000000000000,d5 ; d5 = (0000 0000 0000 0000 FE00 0000 0000 0000)
    lsr.l   #7,d5                 ; d5 = (0000 0000 0000 0000 0000 000F E000 0000)
    lsr.l   #7,d5                 ; d5 = (0000 0000 0000 0000 0000 0000 0000 00FE)

    move.l  d0,d6                 ; d6 = (???? ???? ???? ???? FEDC BA98 7654 3210)

    and.l   #%0011111111111111,d6 ; d6 = (0000 0000 0000 0000 00DC BA98 7654 3210)
    lsl.l   #4,d6            ; d6 = (0000 0000 0000 00DC BA98 7654 3210 0000)
    lsl.l   #4,d6            ; d6 = (0000 0000 00DC BA98 7654 3210 0000 0000)
    lsl.l   #4,d6            ; d6 = (0000 00DC BA98 7654 3210 0000 0000 0000)
    lsl.l   #4,d6            ; d6 = (00DC BA98 7654 3210 0000 0000 0000 0000)

    move.l  #$40000000,d0 ; VRAM write command
    or.l    d6,d0            ; d0 = (01DC BA98 7654 3210 0000 0000 0000 0000)
    or.l    d5,d0            ; d0 = (01DC BA98 7654 3210 0000 0000 0000 00FE)
    move.l  d0,vdp_control   ; VDP write

    movem.l   (sp)+,d0/d5-d6

    rts

This approach is not fast enough for real-time games, but works well enough as a proof of concept.

An alternative approach uses the same memory layout as the characters in VRAM, which should make for significantly improved performance when copying the framebuffer to VRAM.

Character Bitmap Framebuffer

The character bitmap framebuffer has less intuitive memory layout, but should simplify copying the framebuffer to VRAM>

Non-trivial rendering: Putting a pixel on screen requires a few calculations.
Framebuffer is layed out as an exact copy of character data in VRAM.
Simple VRAM copy: Optimal performance when copying to VRAM, and enables using DMA to copy.

Example character 256x224 pixel bitmap - x and y values are pixel positions, the numbers in the table denote memory byte offsets, every 8-bit byte represents 2 4-bit color palette indices)

x/y   0   2   4   6     8   10  12  14         248 250 252 254
     ---------------------------------------------------------
 0 |   0   1   2   3 |  20  21  22  23 | ... | 3C0 3C1 3C2 3C3
 1 |   4   5   6   7 |  24  25  26  27 | ... | 3C4 3C5 3C6 3C7
 2 |   8   9   A   B |  28  29  2A  2B | ... | 3C8 3C9 3CA 3CB
 3 |   C   D   E   F |  2C  2D  2E  2F | ... | 3CC 3CD 3CE 3CF
 4 |  10  11  12  13 |  30  31  32  33 | ... | 3D0 3D1 3D2 3D3
 5 |  14  15  16  17 |  34  35  36  37 | ... | 3D4 3D5 3D6 3D7
 6 |  18  19  1A  1B |  38  39  3A  3B | ... | 3D8 3D9 3DA 3DB
 7 |  1C  1D  1E  1F |  3D  3D  3E  3F | ... | 3DC 3DD 3DE 3DF
   | ----------------------------------  ...  ----------------
 8 | 400 401 402 403 | 420 421 422 423 | ... | 7C0 7C1 7C2 7C3
 9 | 404 405 406 407 | 424 425 426 427 | ... | 7C4 7C5 7C6 7C7
     :
     etc.

Rendering translates screen x- and y-coordinates to character layout. The translation should be performed like this:

putpixel(x,y):
  xoff_bytes = (x/2)%4
  xoff_tile  = ((x/8)%32)*$20
  yoff       = y*4
  yoff_tile  = (y/8)*$400
  addr = xoff_bytes + xoff_tile + yoff + yoff_tile

The algorithm for copying the character-based framebuffer to VRAM is trivial, simply copy 28 KB of RAM to VRAM. Because there is no change in memory layout, the copy can be performed using a DMA copy.

Alternative Framebuffer Layouts

Other Framebuffer layouts could be used to implement completely different types of raster graphics. For instance, the framebuffer bitmap could have different types of color definitions, such as 1-bit colors, or more than 16-bit colors and use e.g. dithering to emulate them when transferring to VRAM.

Another approach could use a low-resolution framebuffer that gets scaled when transferring to VRAM.

Graphical fidelity can be sacrificed for much better performance by replacing writing to the character data in VRAM by using a fixed set of characters and interpreting the framebuffer by writing to the playfield map instead. If each pixel in the framebuffer is mapped to a single tile, we have something equivalent to ASCII-art, but it is also possible to achieve higher resolutions by using semigraphical characters.

References

'Racing the Beam' (2009) by Nick Montfort and Ian Bogost
SEGA2.DOC - VDP information and terminology