If you’ve ever wondered how the Pi functions internally, the answer will really depend on how deep you want to examine it. Each part on the circuit board has documentation, sometimes running to thousands of pages. I wanted to explore it at a higher level, but still detailed enough to begin to understand how the Pi functions when computer code is executed. The diagrams and explanations here are not 100% accurate (and the Pi probably does not have little robots inside it as shown in the diagrams here), but are slightly representative of how some computers function at a high level.
Voltages and Clocks
The Pi has a micro USB connector. It is solely used for the purpose of supplying power to the Pi. On the circuit board, voltage converter circuits (known as DC-DC converters) are used to derive lower voltages (typically 1.8V and 3.3V) from the 5V connected source. These particular voltage values are popular for driving many microchips i.e. integrated circuits (ICs) these days.
Nearly all digital electronics these days requires a clock source. A clock is a digital signal (for example 3.3V and 0V levels) at a particular frequency. Many different clock speeds or frequencies are needed for the Pi and these are ultimately derived from a crystal oscillator which behaves like a tuning fork; when energized it will vibrate at a very accurate known frequency. Circuitry known as oscillators and phase locked loops (PLLs) are combined and used to derive different frequency signals from the main crystal frequency.
The Central Processing Unit (CPU)
Speaking of digital electronics, this is the foundation of the computer. The computer consists of a few important things and the heart of it is the central processing unit (CPU) built from digital electronics. The computer runs software programs which are ultimately instructions sent to a CPU. The CPU’s job is to act on the instructions and store or apply the result into memory or onto wires as signals. A signal is a controlled voltage or voltages on a wire. Through agreements and documentation, whenever a TV sees a particular signal on its HDMI connections, it will generate a video image. As the signal changes over time, so the video image will get constructed and change over time. Similarly, a signal on the wires that attach to the USB connector are defined in standards. Provided the signal has the correct voltage levels and is modified over time in the defined manner, any attached USB device will recognise the signals and behave accordingly as (say) a mouse or a printer.
Although the standards can be complex, it all boils down to the fact that all these attached devices require digital logic signals that are output from the computer at the control of a central processing unit taking instructions as input and acting on them. It could take thousands or tens of thousands of instructions just to print ‘Hello’ with a USB attached printer. Fortunately we don’t need to write so many instructions; open source software and operating systems such as Linux already have the code written and tested and debugged over the years and we can make use of it.
There are many different implementations of CPUs (and many books have been written about them) however a typical CPU consists of an instruction decoder, registers and an address decoder.
Registers are used to store a number, and a CPU will have several, with names like R0, R1, R2 and so on. These are (usually) general-purpose registers that temporarily store instructions and data while the CPU is executing an instruction. There is a special register called Program Counter or PC. It contains a value that usually increases as each instruction is processed. The instructions come from memory such as RAM or ROM. The flow is that the blue robot consults the PC to determine which location in memory the instruction or data needs to be fetched from. The yellow robot consults a table to determine where that location resides (the RAM or ROM could be inside the same microchip, or on another chip on the Pi). Once the location has been determined, the instruction or data is fetched, and either processed further or placed in a register. As an example, a particular instruction might be to store the next item in memory into register R1. The next instruction after that may be to add the contents of two registers and store the result in memory. Another instruction could be to change the value of the contents of the PC register to some particular value. This instruction would have the effect of causing the instruction execution to jump from its normal linear sequence to a different area of RAM or ROM. Yet another instruction could be to set some signal wires to the value of the contents of a register. Since digital electronics consists of low and high voltages, numbers converted into binary can be used to control many signal wires simultaneously. An 8-bit binary number has a decimal range of 0-255 and so a value of (say) 85 in decimal would be equivalent to 01010101 in binary, and that would have the effect of setting eight signal wires to high and low voltages in alternating sequence.
The power of a computer in part depends on the richness of the instructions that it can perform. Some typical instructions are related to data processing, such as adding data stored in memory or in registers. Other instructions are related to getting data moved between the registers and memory. Another group of instructions are for conditional execution. These are used to change the PC value (so that different code can be executed) but only if a particular condition is satisfied (such as if the content of a register is zero).
These instructions may seem very simplistic compared to the high-level goals of (say) controlling a printer, but nevertheless with a sufficiently large volume of instructions any complex behavior can be achieved (this will be demonstrated toward the end of this blog post where such simple instructions will be used to build a times table). The CPU operates at high speed so billions of instructions can be executed per second.
General Purpose Input/Output (GPIO)
By setting signals onto certain wires the CPU can be used to control devices attached to the 40-way connector present on the Pi. The way this works is that instructions are used to either fetch a value from memory, or calculate a value using the data processing instructions from the instruction set. The value (usually an 8 or 16-bit number) in binary can be used to control multiple pins and set them to high or low values. The CPU operates at very low voltages so the signals are amplified up to 3.3V levels and sent to the 40-way connector. The 40-way connector has some pins dedicated to GPIO, and other pins are attached to 0V, 5V and 3.3V. This means that it is possible to power attached circuitry without necessarily requiring a second power supply (but take care not to use too much power).
The GPIO works both ways; it is possible through instruction execution to set some of the signals to become inputs instead of outputs. It then becomes possible for CPU instructions to read binary values from external circuits and store the result in memory or act on it conditionally.
HDMI Video Output
The HDMI connection is used to provide several streams of high-speed signals to the attached TV or monitor in order to display video. This is achieved by taking bytes of memory and chopping them up and sending the serial bit streams to the HDMI connections. Timing is important, and the bit streams are sent at a defined rate using a clock generated from the PLL circuit described earlier.
Analog Video and Audio Output
There is a 3.5mm jack socket on the Pi, and it is used to output stereo audio and an analog video stream (this video stream usually goes to the composite phono or RCA connector on older TVs). The stream of bits (i.e. a signal composed of high and low levels) is averaged or smoothed out using a filter circuit. The result is a signal with a gradually varying level based on the amount of time the bit stream was high or low. Several analog signals are generated by the Pi; two of them are for stereo audio for direct connection to headphones or an external amplifier. The video signal is useful as a last resort if HDMI isn’t used; the low-res from the analog video output is not ideal for using the Pi. HDMI offers higher resolution capability.
USB and Ethernet
The Pi uses some tricks to have four USB connectors and Ethernet capability. Again bytes of data are chopped up into a stream of bits. The stream is shared however across all the USB connectors and Ethernet. A special circuit (known as a USB hub) is used to share the stream across them all.
To send and retrieve information to/from the micro SD card, bytes of information is chopped up into several streams of bits and sent between the card and the Raspberry Pi. Incidentally since there are only 4 streams, the process of moving data takes some time; it is not as fast as storing and retrieving data directly from RAM which has 32 streams of signals. Furthermore the memory inside SD cards is slower. An even faster location for data is to store it in the registers in the CPU (e.g. R1, R2, etc) however there are not many registers. Usually software is written such that the most frequently accessed information is stored in the location with the fastest access times.
A CPU Demonstration
To demonstrate that complex behaviour can be achieved from simple instructions, it is interesting to write a small program and try to follow how it works.
The code here performs the task of performing multiplication. It generates a 2-times table. In other words, it will store in memory the numbers 2, 4, 6, 8, 10 and so on.
In near-english (i.e. pseudo-code), this is what we want to achieve:
for i=0 to 9: res[i]=i*2 loop
The pseudo-code can be rewritten into machine instructions that we want the processor to run:
.myprog: ldr r3, .i mov r2, #0 str r2, [r3] b .loopcheck .loop: ldr r3, .i ldr r3, [r3] ldr r2, .i ldr r2, [r2] mov r2, r2, asl #1 ldr r1, .res str r2, [r1, r3, asl #2] ldr r3, .i ldr r3, [r3] add r3, r3, #1 ldr r2, .i str r3, [r2] .loopcheck: ldr r3, .i ldr r3, [r3] cmp r3, #9 ble .loop
It looks complicated, but each instruction is actually doing a simple task. Together the more complex task of building the 2-times table is achieved.
The first instruction ldr r3, .i is stating that it is desired to take a location we have called ‘i’ and store it in register R3. The second instruction mov r2, #0 is stating to the CPU to put the value 0 into register R2. The third instruction is interesting, it states that the contents of register R2 should be stored into the location referenced in R3 (which is the location ‘i’ if you recall). The end result of the third instruction is that the memory location called i now contains the value zero.
To try to explain this with a diagram, this is what is going on when the first instruction ldr r3, .i is executed:
The character 'i' is just a label for readability, but is actually the number of a memory location. As an example the value of i could be 8240 as shown in this example and that would mean memory location 8240. When the CPU processes this first instruction, it will run the bit of circuitry inside the CPU that will put the value 8240 into the register R3.
The diagram below shows what happens when the second instruction is executed. The CPU is instructed to activate the internal circuitry called 'mov' which in this case will place the value zero into register R2. (Note: there is little difference between ldr and mov instruction, it isn't relevant to this discussion but some instructions can function faster, but may be more restricted in the values that they can work with).
The diagram below shows the result of the third instruction. This instruction treats the contents of R3 as an address and the address decoder locates that memory and activates it, and the value in R2 is stored into the memory location.
Where are all these instructions coming from? The instructions were in a computer program perhaps stored on the micro SD card, but the operating system (Linux) copied them into some area of RAM. The CPU blindly executes instructions from wherever the program counter (PC) register value points at. When a program is requested to run, the operating system will change the PC value to point to the start of the program.
The fourth instruction in the program above is b .loopcheck and it is a branch instruction. It tells the CPU to change the program counter (PC) value to become the location called loopcheck. And so on, each instruction can be interpreted.
After the first three lines of program execution, the CPU registers may look something like this:
In this example register diagram above, it is assumed that the program in RAM starts around location 8004 (There is 512Mbytes of RAM on the Pi, which allows many programs to fit. It isn’t necessarily the case that the program starts at location zero). Each instruction takes (usually) four memory locations, so once the program counter is at 8012, three instructions will have been executed.
The rest of the code in the example above performs a multiplication, stores the result in a location marked ‘res’ and then loops again and stores the next result in a location that is 4 bytes away, and so on (the reason for 4-byte spacing is to do with CPU architecture, and isn’t too important in this current discussion).
After a few iterations, the RAM inside the Pi might look something like this:
You can see the beginnings of the 2-times table in the diagram.
Note that the machine instructions have semi-user-friendly names like ldr and mov but in reality they translate to numbers (known as operation codes or op-codes) stored in memory.
The actual program above looks like this (in hexadecimal) for the first three lines of code:
e5, 9f, 30, 5c e3, a0, 20, 00 e5, 83, 20, 00
The computer programmer never needs to know these values; there is software (known as assembler software) that will auto-translate from the semi-user-friendly names into these values.
Machine instruction names like ldr and mov are not that easy to use either. In addition, they can be different from one CPU to another. It would be awkward to have to re-learn whenever a new, more modern computer was purchased. As a result, most users will write programs in a more universal, easy-to-use programming language and rely on software known as a compiler to convert into the specific machine instructions. Easy-to-use languages will look very similar to the pseudo-code shown earlier.
I hope this blog post was useful to illustrate at a high level how the Pi functions. It is not essential to know this when developing code, but sometimes it is useful to have a rough idea of the inner workings. Depending on how deep you wish to examine microprocessors, it can be useful to explore the ARM website, the Pi's hardware manual and to experiment with the free compiler and assembler called gcc.