(Disclaimer - this is some experimentation, so the information here may not be the best way of doing things.

Also, please read the comments below, which have recent information as people discover things)

17th March 2013 - Anything old is now in purple. Notes have been updated to reflect recent builds.


What is it?

The BeagleBone Black's TI chip (XAM3359AZCZ revision 2) contains the main processor (ARM) along with a number of other modules (see this diagram from the AM335x datasheet).


  Although the ARM Cortex-A8 processor portion of the chip is powerful, the nature of Linux means that real-time control of high-speed external hardware may often still not be easily possible. The TI chip improves the situation by providing two additional CPUs (known as PRU-ICSS or PRUSSv2, I’ll call it PRU for short) on the same silicon. It means that separate software can be run on them, to offload hardware interfacing and processing of low-level protocols.


The chip has been likened to having arduino-type capability on the same chip, but actually the additional CPUs run at a far higher speed (200MHz) which in many cases means that external logic devices, CPLDs or FPGAs may not be necessary.

  Generally, having to program more than one processor is inconvenient and means that a protocol is needed between the processors. This is greatly simplified on the TI chip, because (1) the code for the PRUs can be downloaded from the main processor, and (2) shared memory can be used for communication.

Where would it be useful?

For low-speed comms, conventional I2C or similar protocols can be used, and there is no need to use a PRU. For high-speed comms the PRU may be extremely useful because it can service the hardware with no interruptions due to Linux context switching, and no overhead is experienced by the main ARM processor. Here are some examples that should be feasible; basically quite a few possibilities.

  • interfacing to a fast ADC (e.g. analog capture),
  • CCD or a CMOS camera
  • LED or LCD display
  • analog video generation (video encoder)
  • custom PWM or other custom or non-standard protocols
  • motor control with feedback

  As far as I can tell, it is even possible to clock in parallel data from an external clock.


How to use it?

Currently, it is not straightforward, but certainly not difficult. The main difficulty is finding complete examples on the web. The information here has been gleaned from a lot of web searching and experimenting.

These are the main steps:

  1. Get the PRU system enabled on the BBB board
  2. Get the PRU assembler installed on the BBB (code for the PRUs is written in assembler currently, until someone creates a C compiler for it)
  3. Write the code. PRU applications are in two halves which can communicate with each other through memory addressing:

          (a) The assembler code that is assembled into a .bin machine file to run on the PRU, and

          (b) Some C code that will run on the main processor, i.e. on top of Linux. This C code is responsible for downloading the assembled code in the PRU

   4. configure the Linux device tree to enable any pins for input/output

   5. Run the program

What is the assembler code like?

It’s not bad. It’s easier than some common assembler languages like for PIC or other 8-bit processors because there are a large amount of registers (all 32-bit), the instructions are orthogonal and bit and byte referencing for manipulations is extremely good. There are not many commands, I’ve only used a few so far out of 45 approx, and that suits me fine (usually I don’t want to invest a lot of time learning assembler for an awkward processor – this is not the case and the PRU instructions seem easy to use).


Is it worth the effort?

I think it is, because it becomes possible to control hardware at high speed (say 50MHz). Each instruction takes 5nsec to execute on the PRUs (200MHz clock, each instruction takes 1 cycle) and no varying latency due to the Linux kernel.


What are the difficult bits?

Mainly, it is the device tree related stuff. Hopefully this can change or become simplified in the future. On a typical microcontroller, inputs/outputs are set using particular registers that reside in part of the memory map of a device. With the current software running on the BBB, the user is prevented from directly modifying such hardware registers from within conventional C code as far as I can tell. With the current method, a ‘device tree’ is used; it is a text file that is compacted into a binary file which is read when the system is booted. The file tells the system which pins are inputs/outputs. Device tree modifications are also used to enable the PRU system.

What is the device tree?

See a number of posts (e.g. post 109) here, Selsinork has created some useful examples of it, such as using it to switch off an LED which flashes by default on the BBB.

  The device tree resides in the /boot folder on the BBB, and it is a binary file that is not understandable (example snippet below). It has a .dtb or a .dtbo filename:


There is a program on the BBB called dtc that can be used to convert to a text readable form or vice-versa. Here is a snippet of what the text form looks like (usually a .dts suffix):



Working with the device tree means converting the existing binary .dtb file at /boot into a text source .dts file, making some changes and then converting back into the binary format, and then rebooting the BBB for the changes to take effect.

Here is the procedure to convert the binary file into a text file:

cd /boot
cp am335x-boneblack.dtb am335x-boneblack.dtb_orig
dtc -I dtb -O dts am335x-boneblack.dtb > am335x-boneblack.dts_orig

After any modifications have been made (maybe copy the file first so that you have a backup), it can be converted back into binary form using:

dtc -I dts -O dtb am335x-boneblack.dts_pru > am335x-boneblack.dtb_pru
cp am335x-boneblack.dtb_pru am335x-boneblack.dtb

These commands will be used for a couple of the steps below.

Step 0: Get the BBB ready in general for any development

Although it is possible to compile up code on an x86 Linux server, the BBB is fast enough that there is no need to cross-compile. Usually I tend to write the code on a Windows or Linux machine, and then use SFTP to transfer the files across to the BBB and compile there. But, sometimes vi still gets used on the BBB. The Angstrom Linux has a few defaults that may not suit everyone. I’ll place them in a separate post – some people may not want to do them, or may have better suggestions. By the way I used bash shell for everything, not the default sh. Just in case that makes a difference to environment variables setting.


Anyway, it is advisable to upgrade the software to the latest. This requires a 4GB minimum microSD card that can be programmed from a PC.

Step 1: Get the PRU system enabled on the BBB board

The information in color here is now historic. Today it is possible to enable the PRU using a "dts fragment file" also known as ".dtbo" method. So, skip the colored bit.

By default, lsmod reveals that uio_pruss is not installed, so you have to type the following to install this module:

modprobe uio_pruss

The device tree needs updating to enable the PRUs. Once you have got a text version of am335x-boneblack.dtb (using the method described earlier under "What is the device tree?") then edit it and make the following changes:

Search for pruss@4a300000 and then under it change

status = "disabled";


status = "okay";

(note: the correct procedure is not to do the above in the .dtb file but rather in a .dtbo fragment file as explained in the comments below, but there is currently a bug that makes PRU enablement unreliable via the .dtbo) - Note 2 - the bug appears fixed in recent images, so no need to do this in the .dtb file. Just do it in the .dtbo file instead. However, just make the led0 changes here, to disable the flashing.

While you are at it, make Selsinork’s LED change to disable the flashing USR0 LED (the LED on the far end of the board). It is extremely useful to disable it so that you can try to control it from the PRU as an experiment. This is what needs to be done:

The LEDs are controlled by this part:

gpio-leds {

compatible = "gpio-leds";

pinctrl-names = "default";

pinctrl-0 = <0x3>;

led0 {

                                label = "beaglebone:green:usr0";

                                gpios = <0x5 0x15 0x0>;

                                linux,default-trigger = "heartbeat";

                                default-state = "off";


led1 {..etc



So, we can change led0 (aka USER0 aka USR0) by changing the line from:

linux,default-trigger = "heartbeat";


linux,default-trigger = "none";


Ignore the colored bit, it is no longer necessary.

There is one more change for now, but I’ll explain it later. For now, you may wish to make this change too, to run some example code later. If you don’t make the change now, no problem; the device tree can always be updated again at a later date and then the board rebooted.

Search for a line that says

pinctrl-single,pins = <0x54 0x7 0x58 0x17 0x5c 0x7 0x60 0x17>;


and change it to:

pinctrl-single,pins = <0x030 0x06 0x54 0x7 0x58 0x17 0x5c 0x7 0x60 0x17>;

Note: Don't do the pinctrl  modification in recent releases. Just use the .dtbo file method because it works. The information above will be deleted soon, because it does not apply to recent releases.


Save the file, and then convert into the .dtb file as mentioned earlier, and then the board can be rebooted. Now the annoying flashing LED has stopped flashing and can be used for our debug purposes.

Read Step 4 now, to see how to create the fragment file. The ordering is a little back-to-front because in the past, the .dtbo method did not work to enable the PRU. Today it works.

Step 2: Get the PRU assembler installed on the BBB

This is straightforward. Find the file am335x_pru_package-master.zip from the Internet and save it onto the BBB and unzip to a folder.

Type the following:



go to pru_sw/app_loader/interface and type:



then go to pru_sw/utils

mv pasm pasm_linuxintel

cd pasm_source

source ./linux_build

Go to pru_sw/example_apps

make clean




The steps above will have created the assembler, and also some demo programs. As mentioned earlier, PRU applications are in two halves;

     (a) the hand-written assembler code that got assembled into code (a .bin file) to run on the PRU of course, and

     (b) some C code that will run on the main processor.


The latter is responsible for two things:

  1. Uploading the assembled binary file into the PRU, and
  2. interacting with the PRU to pass/fetch information.

The source code for (b) resides in the example_apps/xxx folder, and when compiled it creates a .o file in the obj folder which we can link with libprussdrv.a to create our executable.

The assembler code for (a) resides in the same example_apps/xxx folder as a .p and a .hp file but when assembled, the .bin file sits in example_apps/bin

The commands earlier will have created the .bin file for (a), and the .o file for (b).

The .o file can be linked into an executable using (say):

cd example_apps/PRU_memAccess_DDR_PRUsharedRAM/obj

gcc PRU_memAccess_DDR_PRUsharedRAM.o -L../../../app_loader/lib -lprussdrv -lpthread –o mytest.out


With both the executable (mytest.out) and the .bin file in the same folder (You will have to move it manually), the executable can now be run:



This is the output:

INFO: Starting PRU_memAccess_DDR_PRUsharedRAM example.


        INFO: Initializing example.

        INFO: Executing example.

File ./PRU_memAccess_DDR_PRUsharedRAM.bin open passed

        INFO: Waiting for HALT command.

        INFO: PRU completed transfer.

Example executed succesfully.


The folders for the example are a bit messy. To make life easier you may as well copy the library and header files to system folders:

Go to pru_sw/app_loader/lib

cp libprussdrv.a /usr/lib/.

cd ../include

cp *.h /usr/include

cd pru_sw/utils

cp pasm /usr/bin


Now you could use a simple makefile to create your own code in any folder. That is what will be done in Step 3.

An explanation regarding the file suffixes: the .p means assembler source file, the .hp is like an include file but behaves exactly the same as the .p file so technically you could put everything into a single .p file if desired; the assembler is simple to use. No linker is required by the way.


Step 3a: Write the assembler code to run on the PRU

Here I just reused some of the example code (from the PRU_memAccess_DDR_PRUsharedRAM example in Step 2), and modified it so that it flashes the USR0 LED (the LED on the far end of the board).

Here is the entire .p file, I called it prucode.p:

// prucode.p

.origin 0

.entrypoint START

#include "prucode.hp"

#define GPIO1 0x4804c000


#define GPIO_SETDATAOUT 0x194


    // Enable OCP master port

    LBCO      r0, CONST_PRUCFG, 4, 4

    CLR     r0, r0, 4         // Clear SYSCFG[STANDBY_INIT] to enable OCP master port

    SBCO      r0, CONST_PRUCFG, 4, 4

    // Configure the programmable pointer register for PRU0 by setting c28_pointer[15:0]

    // field to 0x0120.  This will make C28 point to 0x00012000 (PRU shared RAM).

    MOV     r0, 0x00000120

    MOV       r1, CTPPR_0

    ST32      r0, r1

    // Configure the programmable pointer register for PRU0 by setting c31_pointer[15:0]

    // field to 0x0010.  This will make C31 point to 0x80001000 (DDR memory).

    MOV     r0, 0x00100000

    MOV       r1, CTPPR_1

    ST32      r0, r1

    //Load values from external DDR Memory into Registers R0/R1/R2

    LBCO      r0, CONST_DDR, 0, 12

    //Store values from read from the DDR memory into PRU shared RAM

    SBCO      r0, CONST_PRUSHAREDRAM, 0, 12

    // test GP output

    MOV r1, 10 // loop 10 times


    MOV r2, 1<<21


    SBBO r2, r3, 0, 4

    MOV r0, 0x00f00000


    SUB r0, r0, 1

    QBNE DEL1, r0, 0

    MOV R2, 1<<21


    SBBO r2, r3, 0, 4

    MOV r0, 0x00f00000


    SUB r0, r0, 1

    QBNE DEL2, r0, 0

    SUB r1, r1, 1

    QBNE LOOP, r1, 0

    // Send notification to Host for program completion

    MOV       r31.b0, PRU0_ARM_INTERRUPT+16

    // Halt the processor



As mentioned, this is derived from the example .p file. The only major difference is around where it says “// test GP output”. All the code before it is an example that shows memory access from the PRU and the ARM processor, as run earlier in Step 2. It is useful since it shows how to communicate between the processors.

The new code is intended to flash the USR0 LED ten times, and here is a description of what the assembler code is doing:

     R0: Used to store a large number for use as a delay

     R1: used to store 10 as a loop counter

     R2: This stores the value 1<<21 which is shorthand for bit 21 set, and all other bits clear. The USR0..3 LEDs are GPIO1_21..24 as can be seen in the schematic (see image below), so if we want to control USR0 then bit 21 gets manipulated


R3: This stores the address of the GPIO1 register which can be found in the am335x tech reference manual (4000 pages) and if you click on the blue text at that location, it goes to the chapter which shows the actual individual register addresses for GPIO1. The code uses this to set or clear the GPIO1 pin 21 in this example.




Some example assembler syntax:

     MOV r1, 10 – This moves 10 into register 1 (i.e. r1 <- 10)

     SBBO r2, r3, 0, 4 – SBBO stands for ‘store byte burst’ and this moves from registers to an address; in this case the contents of r2 into the address at r3. The zero means no offset, and the 4 means copy 4 bytes (each register is 32 bits).


You can see the code has a main outer LOOP1, and a couple of DEL1 and DEL2 loops.

The prucode.hp file is the same as the .hp file that is in the example folder (I just renamed to prucode.hp); it contains some definitions and some useful macros.

The .p and .hp code gets assembled into a single .bin file. I used a makefile for this, and didn’t assemble it until I had completed step 3b.


Step 3b: Write the C code to run on the ARM processor

The C code is nothing special. It is probably advisable to just re-use the example code. I reused the .c file from the PRU_memAccess_DDR_PRUsharedRAM example and just renamed it to mytest.c

The C code makes use of some library functions that initialize the PRU and download the .bin file into the PRU. It is fairly obvious what is occurring by inspecting the example code.

Step 3c: Create the .bin file for the PRU, and the ARM executable

I’m no Makefile expert but what worked for me is attached. You can just type ‘make’ and it will build the .bin file (prucode.bin) and the executable (mytest).

You can actually run the program now and it will work, without steps 4 and 5.


This will flash the end-most LED on the board 10 times.


Step 4: Configure the Linux device tree to enable any pins for input/output

Although the example worked, actually this is just because we used it to toggle a pin that was already configured for use as a GPIO output (USR0 LED). Ordinarily some work needs to be done with the device tree.

There is also another important point: Most pins on the TI chip can be used for multiple purposes, depending on how they are configured. It is known as Pin Multiplexing (pinmux) and it seems complex because there are so many modes (up to 7) that each pin can be used for.

There is a very useful table that Selsinork created. It is in post #5.

By inspecting this table, you can see which pins on the chip make it out to the headers on the BBB, and the names of the functions in each mode.

This is important to know, because although the PRU runs at a high speed (200MHz), it cannot control pins in the standard GPIO mode at such a high speed. The pins need to be set to a different mode so that the PRU can directly control them. Then, the PRU can control the pins at the high speed (5nsec, i.e. a single clock cycle). The direct mode is known as ‘PRU GPI’ or ‘PRU GPO’ mode (for input and output respectively).

So, in the step 3 above, the LED was controlled with the pin set in standard GPIO mode. If the delay in the .p assembler code was reduced, the LED could be made to flash at around 25MHz max (I checked with a scope). If you want to control at the fastest speed, the direct PRU mode is needed, and currently the only way I know to set this mode is via the device tree.

Once set, the pins can be controlled from the PRU by writing or reading from the special registers R30 (for writing) and R31 (for reading). This is easy to remember, since the 0 in 30 looks like an ‘O’, i.e. output, and the ‘1’ in 31 looks like ‘I’, i.e. Input. There are 16 pins available for input in the fast mode, and 15 for output.


You can see which pins can be used by looking at Selsinork’s table or by inspecting the schematic. For a test, I decided to toggle a pin on a header as a PRU GPO. I decided to use this pin shown on the schematic, since it is clear it can be configured as a PRU GPO because of the name (it has R30 in it, and is less than or equal to 15).


The quicker way is just to check the table. Hit ctrl-F and search for r30 for example. Then, by looking at column 1 you will see which header and pin the connection is available on. For the pin that I decided to use, PR1_PRU0_PRU_R30_14 is GPIO1-12 which goes to P8 pin 12. In the am3359 datasheet, this PRU pin is called GPMC_AD12 and it needs to be in mode 6 to become an output.



Going back to Step 3 briefly, I changed the last bit of the  .p assembler source code to:

    // test GP output

    MOV r1, 10000000 // loop 10,000,000 times


    SET r30.t14

    CLR r30.t14

    SUB r1, r1, 1

    QBNE LOOP, r1, 0

    SBBO r2, r4, 0, 4

    // Send notification to Host for program completion

    MOV       r31.b0, PRU0_ARM_INTERRUPT+16

    // Halt the processor



Basically the code now does not flash the LED, but instead pulses a pin quickly (needs an oscilloscope unless you wish to modify this code to insert a delay).

This line sets the pin:

SET r30.t14


and the following line clears the pin.

Issue a ‘make’ to rebuild the code.

Now, back to Step 4:

To set the pin to mode 6, I used a utility to help called PinMuxUtility_02_05_02_00.zip available from TI’s website. Once installed, it looks like this:


If you double-click on the signal that you require, it turns green and then you can go to File->Save->SourceFile and it will save two files: mux.h and pinmux.h

In pinmux.h, you will be interested for searching for the text ‘pru’ and it will reveal this line:

MUX_VAL(CONTROL_PADCONF_GPMC_AD12, (IDIS | PD | MODE6 )) /* pr1_pru0_pru_r30[14] */\


So, now you know that to use this signal, you need to set a register to value IDIS | PD | MODE6 and that can be converted to a number using the information in mux.h:

#define MODE0 0

#define MODE1 1

#define MODE2 2

#define MODE3 3

#define MODE4 4

#define MODE5 5

#define MODE6 6

#define MODE7 7

#define IDIS (0 << 5)

#define IEN (1 << 5)

#define PD (0 << 3)

#define PU (2 << 3)

#define OFF (1 << 3)


With this information, it is clear that I need to set a register to 0x06. Table 9-10 in the 4000-page tech ref manual reveals the offset that represents the register.

I found this by just doing a text search in the doc for CONF_GPMC_AD12.


According to page 158 of the tech ref manual, the Control Module address begins at 0x44E10000, so ordinarily we need to add the 0x830 offset to it, to get 0x44E10830. However with the Device tree, according to this link when using the device tree the address to use is relative to the base address of the pin gpmc_ad0. That base address is 0x44e10800. So, the number that will be used in the device tree is 0x30.

Remember in Step 1 this change?

pinctrl-single,pins = <0x54 0x7 0x58 0x17 0x5c 0x7 0x60 0x17>;


was changed to:

pinctrl-single,pins = <0x030 0x06 0x54 0x7 0x58 0x17 0x5c 0x7 0x60 0x17>;


The purpose of this was to set the register to 0x06. You can see the register offset ox030, and the value 0x06.

This is not the correct way of doing things (since I’ve just tacked it into the LED section in the device tree), but it works.

The supposed correct way actually didn’t work for me, so I’m hoping someone can spot the issue. (note: It is explained in the comments below).

This is what I believe should have worked. It is based on snippets of information from various places on the web:

In the .dts file (same as before), the following additions are to be made in the appropriate sections (just search for something similar, and then insert these lines):

   cape@12 {

           part-number = "BB-BONE-PRU";

version@00A0 {

version = "00A0";

dtbo = "cape-bone-pru-00A0.dtbo";



     slot@102 {


compatible = "kernel-command-line", "runtime";

board-name = "Bone-Black-PRU";

             version = "00A0";

manufacturer = "na";

part-number = "BB-BONE-PRU";



Then convert to .dtb as before.

Create a device tree “fragment” file called BB-BONE-PRU-00A0.dts containing the following:

* pru dts file BB-BONE-PRU-00A0.dts

/ {
  compatible = "ti,beaglebone", "ti,beaglebone-black";

  /* identification */
  part-number = "BB-BONE-PRU";
  version = "00A0";

  exclusive-use =

  fragment@0 {
    target = <&am33xx_pinmux>;
    __overlay__ {
      mygpio: pinmux_mygpio{
        pinctrl-single,pins = <
          0x30 0x06

  fragment@1 {
    target = <&ocp>;
    __overlay__ {
      test_helper: helper {
        compatible = "bone-pinmux-helper";
        pinctrl-names = "default";
        pinctrl-0 = <&mygpio>;
        status = "okay";

  target = <&pruss>;
    __overlay__ {
      status = "okay";


Convert this file into a .dtbo file using:

dtc -@ -O dtb -o BB-BONE-PRU-00A0.dtbo BB-BONE-PRU-00A0.dts


and then place it at /lib/firmware (not /boot for this fragment).

Then reboot, and then type the following: Note: if the 'export SLOTS' line below doesn't work, try bone_capemgr.9 instead of bone_capemgr.8, or even easier, just use * instead of a number:

export PINS=/sys/kernel/debug/pinctrl/44e10800.pinmux/pins

export SLOTS=/sys/devices/bone_capemgr.8/slots

cd /lib/firmware

cat $SLOTS

0: 54:PF---

1: 55:PF---

2: 56:PF---

3: 57:PF---

4: ff:P-O-L Bone-LT-eMMC-2G,00A0,Texas Instrument,BB-BONE-EMMC-2G

5: ff:P-O-L Bone-Black-HDMI,00A0,Texas Instrument,BB-BONELT-HDMI

echo cape-bone-pru > $SLOTS

cat $SLOTS

0: 54:PF---

1: 55:PF---

2: 56:PF---

3: 57:PF---

4: ff:P-O-L Bone-LT-eMMC-2G,00A0,Texas Instrument,BB-BONE-EMMC-2G

5: ff:P-O-L Bone-Black-HDMI,00A0,Texas Instrument,BB-BONELT-HDMI

6: ff:P-O-L Override Board Name,00A0,Override Manuf,BB-BONE-PRU

Now if you try the commands in purple below, you will see correct value of 00000006!


Now you can see the slot entry, but the pin does not appear to be set to the right mode using this method : - (

I saw this using:

cat $PINS | grep 830

pin 12 (44e10830) 00000027 pinctrl-single


The ‘00000027’ value should become ‘00000006’ but it didn’t work for me. But if I do the quick hack method described earlier, then it is ok:

cat $PINS | grep 830

pin 12 (44e10830) 00000006 pinctrl-single



Any suggestions would be appreciated.

Step 5: Run the program

When you execute ./mytest it will toggle the pin 12 on header P8 rapidly, observable on a scope to be at a rate of 50MHz. So, success, but not using the ideal method in the device tree.

I have not tested inputs yet (it should be similar), or anything more advanced. (note: see the comments below for an example of how to read inputs in the GPIO mode)


Where to get more info?

There is lots of documentation in the am335x_pru_package-master.zip file mentioned in step 2. This is extremely useful I found.

For the device tree, see post 2 here for some very useful links.

I also found lots of useful stuff here and here.

Also, very useful link: http://pinmux.tking.org/ will allow you to very quickly see the pinmux value!! Highly Recommended.

Another note: When doing high speed GPI mode input, the pinmux value needs the "Input Enabled" box checked at the http://pinmux.tking.org/ site.