Search This Blog

Sunday, July 29, 2018

ChibiTerm High Level Design

Projects / ChibiTerm

Analysis

640 x 400/480 VGA text mode uses a 25.175MHz pixel clock. Most of the video monitors can accommodate 25MHz which is commonly used for Ethernet and the crystal/oscillators are easily available at a much lower price.

There is a bit of overclocking involved in this project, so YMMV.

Over-clocking analysis

I have tested the STM32F030 SPI at 25MHz for three chips with different date code. It works fine for the SCK and MOSI signals in 8-bit master mode.

I am trying to deduce where the SPI 18MHz limitation on the datasheet comes from. The APB peripherals STM32F0 value line are qualified to work at 48MHz. The I/O (clock to data out) timing on the datasheet actually very tight and have plenty of margins meant for a much higher frequency.

The SPI clock frequency are at 18MHz for some reasons.



Here is what I think happens: Someone copy the numbers from a different product line.

The following is from their 72MHz STM32F10X line. 72MHz is probably a bit too fast for the SPI block, so they divide the frequency down to 36MHz. There is a further divide by 2 inside the SPI module for the SCK clock. Hence the maximum data rate is: PCLK/2 = 36MHz/2 = 18MHz.


Video generation

Most of the other VGA projects uses a mid range ARM Cortex M with a lot more RAM and/or renders the output in a lower resolution. It is easier to ignore cost and throw resources at a problem. I am trying to use the "value" line STM32F0 for cost reason and as a technical challenge. The part is available here at $0.44 at QTY 10.

My challenge is to implement a working subset of a VT100 terminal while emulating the missing hardware using a high level language and working within similar memory constraints of the original VT100. For VT100 compatibility verification, I am using a standard open source test suite called VTTest.

Initially I planned to use external SPI RAM as the frame buffer for a full graphic + colour display, but I scale back and try to work with only on chip resources. The amount of RAM in this part is only 4KB which is not sufficient to store the entire frame buffer. The alternative is to generate all the scan lines at full resolution (640x400 or 640x480) on the fly for each frame.

The block diagram is actually very similar to the "6845 CRTC" inside old terminals. It is all implemented inside the STM32F030F4 microcontroller.



  • The 50MHz ARM M0 core is the "Processor"
  • The internal RAM is used as "Refresh RAM" that holds a Character buffer as well as a scan line buffer. Thebuffer is used to decouple the (faster than real time) rendering process from the video generation.
  • Hardware timer is used to generate the HS (Horizontal sync) and an IRQ is used to keep track of the current scan line "Refresh Memory Address" for rendering, VS (Vertical sync.) generation and Ping-pong buffer management.
  • DMA transfer rendered scan line from the buffer to the SPI acting as the "Shift Register"
  • A firmware rendering routine described below is used to implement the "ROM Character Generator" function.

The rendering process


  • A text buffer is used to store the characters to be displayed.
  • For each scanline, for each of the character in the active row
  1. look up the correspond row (n) in the font bitmap table
  2. copy the bitmap into a scanline
  • A hardware timer TIM3 is used to keep track of the precise timing and triggers an IRQ to start DMA to copy the previous rendered scanline via SPI to MOSI pin at the pixel clock rate of 25MHz.
  • The MOSI logic level signal is converted into a 0V to 0.7V signal to drive the R/G/B input of a monitor.
  • The Horizontal sync signal is triggered off hardware timer output compare.
  • An IRQ for timer overflow keeps track of the scan lines, generates vertical sync.
The following shows the generated VGA Sync and the SCK for shifting out the pixel. "Enable" (renamed to Background) is the active display area that I use it as a guide for timing tweaks. (It is also used as a gating signal for the external SPI RAM that is not used in this project.)

VGA horizontal scanline: SCK

The following shows the Vsync signal:

VGA Sync Pulses

The firmware have 16 CPU cycles (at 50MHz) to render 8 pixels of video data at 25MHz. Preliminary test (here ) shows much better than expected as the compiler did a better job optimizing the code. The rendering is done for each scan line inside the Output Compare IRQ.
VGA Timing, Rendering CPU utilization
I set the GPIO signal "Render" just before and clear it after the rendering code for timing measurement. It take 16.41us for each of the 32us scan line. The firmware can render faster than the video output. A buffer is used to decouple the timing and remove extra cycles otherwise would be wasted on synchronization.

VGA Generation: IRQ and CPU activities
This diagram shows the activities on an active scan line (32us period).
  1. After the Hsync IRQ finishes the house keeping and DMA buffer management, it sets PendSV flag and exits.
  2. PendSV turns off the lower priority IRQ, clears its own interrupt and went to WFI sleep.
  3. Output compare IRQ wakes up ARM from sleep and starts SPI DMA. Latency is pre-compensated. No jitters are allowed here as they affect video quality. See here
  4. Firmware starts to render the next scan line while DMA is generating video for the current scan line from buffer.
  5. IRQ exits and control returns control back to PendSV which re-enable the lower priority IRQ and exits.
  6. The rest of the lower priority IRQ, driver/application code can runs until they get pre-empted by Hsync IRQ.
About 1/3 of CPU cycles (at 50MHz) are available for application code during an active scanline. This is still quite respectable compared to what was used in those terminals.

Text Buffer

The text is rendered in the IRQ from a 2D array for the text buffer implementing something similar to the old "Text mode" on a PC. To the higher level routines, writing the value of a character to the array automatically prints it. Screen operation such as scrolling/clearing operations work on the text buffer array using the highly optimized memory move and fill functions in standard C library. e.g. memmove() is a smart memory copy routine that can deal with overlapping source/destination and is perfect for text scrolling.

The cursor is emulated by a separate task which saves the character from the array. It periodically alternates between the original character and the cursor using the inverted video character.

Serial port and PS/2

IRQ real time requirement:
  • Serial port at 115200, a character can arrive once every 1/11520 = 86.8us.
  • PS/2, a bit can arrive every ~100us
Both of these can be accommodated as the higher priority rendering code finishes before the next characters arrives. In the terminal demo as well running VT100 emulation, it ran at 115,200bps scrolling text smoothly without requiring handshakes. This can simplify the host software on a 8-bit host.

PS/2 code has robust error recovery and has support hot plugging/removal of PS/2 keyboard. The PS/2 signals are clamped with Schottky diodes to protect the ARM I/O pins from overvoltage. (See "V.1 mods and changes" in log)

Software Tasks

The program is organized as a series of cooperative tasks running in an endless event driven loop. This has a very low memory overhead as all tasks shares the same stack. IRQ low level drivers runs in the background buffering the I/O using a set of FIFO routine for buffering.


while(1)
{
  if(FIFO_ReadAvail((FIFO*)RxBuf))      // Process received serial stream
    ANSI_FSM(Getc((FIFO*)RxBuf));

  if(FIFO_ReadAvail((FIFO*)PS2_Buf))    // Process PS/2 keyboard input
    PS2_Task();
        
  if(Cursor.Update)                     // Blinks the cursor
    Cursor_Task();
}

Each tasks takes turn to run processing data from their FIFO. They are coded as a Finite State Machine and runs to completion saving their contexts between calls. This has a nice side effect as each task has exclusive access to shared data and/or I/O stream until they are done.

  • The serial transmit stream are shared between the PS2_Task( ) (from key presses) and ANSI_FSM( ) (from host queries). Each tasks get to transmit complete data packets without being pre-empted. This prevents the packets from corrupting each other.
  • The Cursor_Task() and ANSI_FSM( ) shares video output. ANSI_FSM get to complete scrolling/clear scree/text updates before the Cursor_Task( ) draws a cursor. This prevents video artefacts during a scroll and actually simplifies the program structure.

Pinout

Sample application circuit

Sample application schematic (Click for original size)
XD: connect to host TTL Serial output

TXD: connect to host TTL serial input

The video output is monochrome, but you can make it less boring

  • You can get old Green text by only connecting the Green video signal like my Matrix Clock Demo. I initialized the SPI so that it serializes with LSB first to mirror the fonts.
  • Connect Blue to pin 13 (Backgnd). This outputs a white on blue text (BSoD) like here.
  • Black text on white is supported as part of the VT100 control code.
Boot strap jumper is used for updating firmware from TTL serial port. This makes it easier to update firmware for an embedded design. See ST's Bootloader program page for PC program and AN2606 Application note STM32 microcontroller system memory boot mode for details.

Bill of material

See Github for most up to date version of schematic/layout/BOM.

I have tried alternate low cost Chinese parts in my prototype and they work as well.

The microcontroller STM32F030F4 is currently available with "Free shipping" on Aliexpress at $0.44 US at QTY 10. DIY parts sourcing: (They raised the price, but it is still < $1) For production, it is also available from the official distributors: here at a low cost.


No comments:

Post a Comment

Note: Only a member of this blog may post a comment.