I recently have had an excuse to try writing a PIO program for the PI Pico. Aside from that, I have only copy-pasted the WS2812B PIO state machine code.
Although this maybe is maybe isn't the greatest example, but 4-bit interfacing with the HD44780 LCD controller does take a small amount of bit wrangling.
As a brief refresher in 4-bit mode you strobe the upper 4-bits then the lower 4 character bits in succession, like the following:
The enable line is driven as a sideset output from the PIO module. LCD data bits are driven as PIO output pins.
@rp2.asm_pio(out_shiftdir=PIO.SHIFT_RIGHT,sideset_init=(PIO.OUT_LOW),out_init=([PIO.OUT_LOW]*4),fifo_join=PIO.JOIN_TX) def HD44780(): # Actual program body follows wrap_target() pull(block) .side(0)  # Load 32bit word from fifo into OSR set(y,3) .side(0)  # Loop 4 times for 4 8bit chars label('charout') out(x,4) .side(0)  # stash lower 4 char bits out(pins, 4) .side(1)  # Write high 4 char bits, Drive EN High nop() .side(0)  # 2 cycles mov(pins,x) .side(1)  # Write low 4 char bits, Drive EN High set(x,8) .side(0)  # Loop (8+1) times label('chardelay') nop() .side(0)  # 7+1 cycles jmp(x_dec,'chardelay') .side(0)  # 1 cycle jmp(y_dec,'charout') .side(0)  # 1 cycle wrap() # 0 cycles
I chose to clock the PIO state machine at 2 MHz, so each cycle is 500 ns. The ".side(x)" defines the enable pin state and the [n] adds an addition n cycle delay to the instruction execution time.
I choose to make full use of FIFO by taking in 4 characters in each 32-bit word, but one could just as easily only accept the low 8-bits of the FIFO word as one character only.
Even the character to character delay timing is taken care of by the PIO state machine with an 81 cycle delay.
There are a few other neat tricks for a transmit only state machine, like using the input shift register as an addition scratch register or as input parameter as seen in the PWM example found at: https://github.com/raspberrypi/pico-micropython-examples/tree/master/pio