To celebrate Pi Day and releasing XXICC rev 0.0j, I took a look at the VideoCore IV documentation. So far I've only read about the GPU, and I'm very impressed. It's has 16 physical 32-bit floating-point processors, each with an adder/integer ALU and a multipler which can operate simultaneously. Each group of 4 processors runs a SIMD stream with minimum vector length of 16. I don't know what the clock speed is, but at 500 MHz you get 16 GFLOPs peak if I've done the math correctly.
The GPU architecture is quite similar to IBM's GF11 supercomputer, which could do 11 GFLOPs using 1983 technology -- the fastest supercomputer of its time. However, GF11 was a 24 foot by 24 foot room full or racks (20 MFLOPs per square foot) and IIRC used 200 KW of power plus air conditioning. VideoCore IV gets similar peak performance on a single chip with a few watts. OTOH, GF11 had a lot more dedicated memory per processor, and the hardest part of supercomputing is getting data to the ALUs: anyone can build a fast floating-point processor. So whether you can keep the ALUs busy depends on how many calculations you can do using your operands before having to get more from main memory. If you need to do lots of image processing OPs per pixel with its neighbors, you're in good shape.
So if you have a computationally-intense application with lots of near-neighbor calculations, check out VideoCore IV. It's an amazingly cheap supercomputer.
I wonder how releasing VideoCore IV will affect Parallella?
I didn't expect any of the other GPU IP companies to follow suit and unfortunately it looks like I was right. Personally I don't see this ever happening for any of the "big" players.
Obligatory: now if I could just get schematics for the A+, B+ (or at least have the updates to the USB added to what is currently available), and 2 I'd be able to recommend the Pis again.
Amazing news from RasPi: A birthday present from Broadcom | Raspberry Pi
Eben Upton wrote:
Earlier today, Broadcom announced the release of full documentation for the VideoCore IV graphics core, and a complete source release of the graphics stack under a 3-clause BSD license. The source release targets the BCM21553 cellphone chip, but it should be reasonably straightforward to port this to the BCM2835, allowing access to the graphics core without using the blob. As an incentive to do this work, we will pay a bounty of $10,000 to the first person to demonstrate to us satisfactorily that they can successfully run Quake III at a playable frame rate on Raspberry Pi using these drivers.
I did not expect to see this from Broadcom given their history. But it's great news for the OSHW community since it may prompt others to release their GPUs as well. Who knows, maybe this will prompt Xilinx and Cypress to program their chips at the bit level.
Update 12 June 2015: The "Broadcom announced" link is dead today. However, you can still download the VideoCore IV Architecture Reference Guide at: https://www.broadcom.com/docs/support/videocore/VideoCoreIV-AG100-R.pdf