We know SPI as a 4*-wire protocol. But it doesn't have to be.

I'm checking high speed SPI data transfer with buffers, DMA and parallel data lines.


buffered SPI versus Naive


In this blog, I introduce the buffered SPI. I'm sending 64 16-bit values in one SPI call..


Our Unbuffered SPI Performance


Until now, I've sent data to the LCD pixel by pixel. I've optimised the drawing functions as much as I could.

I reduced the

  • number of SPI calls from 8 * 128 = 1024 per line to 7 + 128 = 135 per line,
  • number of bytes sent from 128 * 13 = 1664 per line to 11 + 128 * 2 = 267 bytes per line,
  • lined draw time from 2.6 ms to 376 µs.

That's as far as I can go with the pixel per pixel approach. There isn't much more gain to be eeked out of this design.


Buffered SPI: Offload Work to the SPI module


If I want to improve drawing speed, I'll have to switch to transmitting multiple pixels in one SPI call.

The Hercules microcontroller that I use supports this. Next to standard SPI, it has a multi-buffered SPI (MibSPI) mode.

I'll configure the MibSPI to take a buffer of 64 16-bit words. The MibSPI functions will take care that the SPI module accepts that buffer and pumps that to the LCD in one go.

I'll be able to send all pixels for a full line in 2 SPI calls instead if 128.


I can reuse a data format that I'm already using from the pixel-by-pixel format. The data doesn't change.

It's still 16-bit, 27.5 MBaud as before.

What I do change is the way that data is handed over to the SPI module. I configure MibSPI so that it preps and send 64 entities of data format 1 in one single shot.



In the firmware, I tell MibSPI where the address of the first 16-bit values is. It will then, all by by itself, send the 64 values from that address on to the slave, one shot.


void _writeData64(uint16_t *data) {
    gioSetBit(_portDataCommand, _pinDataCommand, 1);
    mibspiSetData(mibspiREG3, 2, &data[0]);
    mibspiTransfer(mibspiREG3, 2 );
    while(!(mibspiIsTransferComplete(mibspiREG3, 2))) {


A full line takes me 2 calls:


void flashBitmapLine(bitmap_t *bmp, uint32_t line) {
    loadBitmapInDMABuffer(bmp, line);


The changes I had to make to the code are small. I get a decent payback though.


I still require 16 µS to get the LCD line initialised. The 2 64 16-bit buffers though, dropped from 360 µs to 97.2 µs.


  • number of SPI calls from 7 + 128 = 135 per line to 7 + 2 = 9 per line,
  • number of bytes sent stays 11 + 128 * 2 = 267 bytes per line. I've exhausted the possibilities to improve that already. We send exact the same data (with less SPI calls, and faster.
  • lined draw time from  376 µs unbuffered to 114 µs buffered.


Long live buffered SPI. It doesn't make my program more complex. But improves speed drastically.

This changes in the next blog where I bring DMA to the mix. That adds some complexity.

I'm starting my learning now: https://training.ti.com/hercules-tutorial-mibspi-and-dma-overview


The Series
0 - Buffers and Parallel Data Lines
1a - Buffers and DMA
1b - SPI without Buffers
2 - SPI with Buffers
3a - SPI with DMA
3b - SPI with DMA works
4a - SPI Master with DMA and Parallel Data Lines
Hercules microcontrollers, DMA and Memory Cache