96 Replies Latest reply on Jul 23, 2013 5:49 PM by John Beetem

    Raspberry Pi server clusters

    morgaine

      One of my current intentions is to play with server clustering once the Raspberry Pi is in volume production and the 1-per-person restrictions are lifted.  I have a long-term background in parallelism and concurrency --- my doctoral research was in the topic, and I lectured on it later as well, so it's quite dear to my heart.  The very low price of the board makes this feasible with a monetary outlay far below anything else, so I'm really looking forward to an Rpi clustering project.

       

      I'm sure that I'm not the only one thinking about Rpi+clustering.   If anyone here has this kind of application in mind, or just general interest in the subject, please keep in touch and post any interesting links you may find on the topic.  Once there are millions of the boards around, this could be a very popular area.

       

      Morgaine.

        • Re: Raspberry Pi server clusters
          GreenYamo

          Morgaine, sounds very interesting and something I would like to play around with, however I am no-where near your level of expertise.

           

          I do wonder how long it will take to be able to buy a cluster though when getting one seems hard enough

          • Re: Raspberry Pi server clusters
            tronetix

            Was thinking same thing. Why just 1 Pi? How about 10.

            One of my first enclosure mechanical designs is to house 10, slide in Pi's. Suitable airflow. Probably need some networking hub or router board, backplane for the GPIO , USB, JTAG and power, get rid of the audio and RCA connectors; they are a mechanical nuisance anyways. Light pipe in acrylic the LEDS for front monitoring. A brick of Pi's would be pretty powerful.

              • Re: Raspberry Pi server clusters
                morgaine

                Yes indeed Richard, and such a server cluster would probably consume very little power too if we can disable enough subsystems, especially the graphics core.  (Another possibility is to use the VideoCore for GPGPU/OpenCL type functions, but at this point it's unknown whether that really holds any potential.)

                 

                Slide-in modules are a must when talking about 10+ boards, as wiring them together by hand will lose its appeal really fast.  I'm currently building a 3D printer which I hope to employ to print out suitable module holders and the rest of the structural framework.

                 

                PS. I only think in powers of 2, and 16 seems a nice number as a basic Rpi cluster node from which larger clusters can be built.

                  • Re: Raspberry Pi server clusters
                    tronetix

                    I was looking at building my own 3D printer, then I realized it was better to just spend the effort in quality design with a high quality, low cost 3D design syste and let the printer services do their job better and reasonably priced, like at Shapeways or ProPrinters. I do 3D CAD along with PCB so the overall design and manufacuring of computer and communications products is dear to me, (in reality, a multi-decade trans-century curse).

                    Now, if a guy could individually purchase the heart of the Raspberry, the BCM2835, plunk 10 down on a board, add circuitry for the LAN for all 10 and only 1 unit using the VideoCore, then a small form super computer could be formed based off that pretty nifty System on Chip processor the folks at Broadcom came up with. But then again, the overall price of the Raspberry as a demo unit for the Broadcom chip really trumps designing new system. Besides, there are many other ARM processors also suitable for Linux and thus clustering but I'm sure their individual cost plus assembly effort can't compete with simply stuffing a box full of already assembled Raspberrys.

                    I guess you could call this concept, the Raspberry Pi Raq

                      • Re: Raspberry Pi server clusters
                        morgaine

                        I like that idea, the "Raspberry Pi Raq".  Make the number of devices a power of 2 and I'll buy.

                         

                        I wouldn't use the BCM2835 if I were designing boards for this though, I'd pick a more modern ARM device, particularly given that the VideoCore is not helpful in this application.

                         

                        And, looking further into the future, I'd love to use the OpenRISC ASIC, which is projected to cost $5 in single units.  If only OpenCores would get themselves the financial backing of a major open source player like RedHat ...  Progress towards funding through personal donations doesn't seem to be working out too well.

                         

                        Morgaine.

                        • Re: Raspberry Pi server clusters
                          morgaine

                          @Richard: Referring back to my comment about using a more modern ARM if one were designing ARM cluster modules from scratch, one of my worries was that ARM's license fee for Cortex-A* cores would be too high.  After all, Broadcom probably chose an ARM1176 core for the BCM2835 purely to save money, we think.

                           

                          Well that worry was probably ill-founded, judging by the short survey of ARM CPU costings in the first two paragraphs of the Allwinner A10's page:   http://rhombus-tech.net/allwinner_a10/

                           

                          The license cost for Cortex-A8 must be very low indeed if the AM3358 price is $5 and the A10's price is $7.

                           

                          Having mentioned the Allwinner A10 CPU, it immediately springs to mind that Rhombus Tech's EOMA-68 CPU card seems to be a plug-in cluster module all ready and waiting to go when it becomes available.  The BOM cost of $15 makes it highly relevant here, since it was the very low price of Rpi that created this great niche, and I'm sure that lots of other participants want to play the game too.

                           

                          More references:

                           

                           

                          Morgaine.

                            • Re: Raspberry Pi server clusters
                              tronetix

                              Interesting,

                              I am just now finishing a PCB design for an aerospace client that uses an X86-cpu mini pci module. Had to add alot of support and conversion circuitry such as LVDS to DVI , PCI Bridge and with loads of high speed, controlled impedance routing. A module like the EOMA would have greatly simplified if not for the need to support also VGA and special FPGA sync rx and tx drivers and serial ports in this seemingly archaic application. The mini-pci connector probably costs more than the EOMA itself will. Besides, they pay me to design their nightmares, not engineer my own obsolescence. I think I need to explore the details of the rhombus project deeper for real world applications.

                               

                              In the meantime, the Raspberry is a quick turnkey product ready to tray and rack (if had access to about 8 or 16..^2 thing..) once things like power, booting, operating individually, plug and play and such needs to be handled. I imagine each Pi to have a snap-on tray with a front panel with the Eth and USB and LED indicators. The tray would slide into the rack. Back access would allow SD card and power plug access. Would be nice if the HDMI could be accessed also from the front for individual maintenance if needed. Dealing with all the Ethernet connections and cabling would be probably most challenging. Adding Wi-fi to each via USB might be cost prohibitive.

                                • Re: Raspberry Pi server clusters
                                  morgaine

                                  Yep, that mirrors my thoughts on the topic too, Richard.  Sticking the Pi into a bespoke plugin module is almost mandatory if one is going to use several, as hardwiring them together directly would be very unsatisfactory for a whole pile of reasons.  Designing a module with a little internal loom and replicating it is much more attractive.

                                   

                                  And you mentioned LEDs .... No self-respecting project should ever skimp on LEDs, we need loads!!!!!

                                   

                                  Morg.

                                    • Re: Raspberry Pi server clusters

                                      In yesterday's webinar, Eben talks about clusters, at 56:25. 

                                      He recommends that for building a cheap supercomputer,

                                      you would be better off using x86 hardware (which is something

                                      he has mentioned before).

                                        • Re: Raspberry Pi server clusters
                                          morgaine

                                          What Eben said is quite right.  ARM isn't the world's fastest architecture, and within the ARM stable, the Pi's 700Mhz ARM1176JZF-S isn't going to win any speed races, so if one's goal is to build a supercomputer then to use a pile of Rpi boards would not be very effective.

                                           

                                          But there are many more reasons for building clusters than just supercomputing.

                                           

                                          One very common one is to run some types of Internet services which use little computing power per session but require multiple sessions to cater for a useful number of simultaneous Internet users.  For that, a cluster of multiple Rpi boards could be very effective if the application fits the constraints of a single Rpi well.

                                           

                                          Also, ARM can beat most other processor families in performance per watt, so a cluster of Pi boards might well be able to hit a particular desired level of performance without requiring as much electrical power as say an Intel system.

                                           

                                          Another application of clusters that isn't supercomputing is High Availability or resilient computing.  If your server cluster provides a number of independent hardware nodes and a heartbeat mechanism then graceful degradation can be achieved, reducing the overall system performance only slightly when one board dies.  This can be important even for home sites, allowing you to provide an effective Internet presence without losing sleep through midnight alerts, and being able to relax when on holiday.  Who knew, ARM clusters have social benefits.

                                           

                                          And finally, actual applications aside, some people like myself want to work with clusters for the simple reason that, despite this being officially the age of multicore, concurrent software still hasn't really caught up with the multicore hardware that's been appearing for several years now.  Concurrency is still a research area even today some decades after I did my PhD in the subject, so I'd like to continue exploring the topic a bit more, and multiple Rpis would provide a perfectly suitable platform without breaking the bank.

                                           

                                          And I'm sure that there are plenty of other good reasons for clustering Rpis as well, outside of supercomputing.

                                           

                                          Morgaine.

                            • Re: Raspberry Pi server clusters
                              kesmikcuz

                              Clustering was my plan as well, though I have zero experience in such. It'll be a fun project for sure!

                              • Re: Raspberry Pi server clusters
                                freads

                                Hello sir,

                                I'm an Undergrad who is interested in cluster computing and parallel computing and i would like to do this as my final year project to analyze signals and do some VLSI based simulations. I would like to know how you are going to deal with raspberry pi's architechture because it has fairly less limitations in terms of horse power when you compare to the A9 processor in Apple TV which LMU Scientists used to construct a cluster.

                                 

                                Thank you,

                                Kishore Kumar

                                  • Re: Raspberry Pi server clusters
                                    morgaine

                                    @freads: Clustering Pi boards for High Performance Computing (HPC) is not an effective use of the Pi.  I wrote about this at more length earlier in this thread, article #14.  What's more, Eben Upton said the same thing in the recent Element 14 webinar.

                                     

                                    My clustering requirements are not for HPC (as I explained before), but it sounds like yours are, since signal analysis and VLSI simulation are very demanding applications.  What's more, you will be quite likely to hit your head against the Pi's limited amount of memory in your application.

                                     

                                    On the face of it then, I would guess that you may be misapplying the Pi.

                                     

                                    Morgaine.

                                      • Re: Raspberry Pi server clusters

                                        in a different forum you would be told that a cluster of Pis is called

                                        a bramble, and is a reasonable thing to do.

                                          • Re: Raspberry Pi server clusters
                                            morgaine

                                            Oh but the comedy value of the fanbois over there is priceless!  Credit where credit is due.

                                              • Re: Raspberry Pi server clusters
                                                Roger Wolff

                                                My master thesis was on parallel computing.

                                                The important thing to realize is that some applications are latency limited, and others are bandwidth limited.

                                                 

                                                In short, suppose you're doing a weather simulation, then after each timestep, for each cell you need the (new) pressure and other parameters for the neighboring cells to be able to do the next calculation for the current cell. If that information was calculated on another node, you'll have to wait for that information to go back and forth between those two nodes.

                                                On the other hand, some applications do not need the feedback from the previous step on the other node, so all that matters is the CPU speed, and the bandwidth between nodes.

                                                 

                                                In fact the raspberry pi isn't good at any of these. Networking would go through USB which incurs a 1ms latency every time you try to do something. USB is slow, and the CPU isn't fast.

                                                 

                                                But still an interesting experiment. In an education environment, you can for example teach students to program the 64-raspberry-pi-cluster  and then later let them run for real on the bigger-better-faster cluster. For about $4000 you'd have yourself a good teaching-cluster. :-)

                                                 

                                                Hmmmmm... Adding 64 thin-bevel(*) widescreens would give you an impressive (15k x 8k) videowall. Wow!

                                                 

                                                 

                                                (*) Or what is the rim around the screen called?

                                                  • Re: Raspberry Pi server clusters
                                                    morgaine

                                                    Roger Wolff wrote:

                                                    > The important thing to realize is that some applications are latency limited,

                                                    > and others are bandwidth limited.

                                                     

                                                    The important thing to realize is that some parallel applications are neither latency limited nor bandwidth limited, because not all parallel applications are in the area of HPC.  But I already explained that point in post #14 and then again in #20.

                                                     

                                                    The fact that Rpi can offer neither good bandwidth nor good latency (nor good CPU performance) is immaterial in such applications, of which there are many.  Rpi is not likely to be effective for supercomputing, as Eben Upton explained in answer to a question.  But clusters are certainly not limited to HPC.

                                                     

                                                    What's more, your reference to video walls happens to provide a rather nice example of an application that can make good use of a cluster of Rpi boards without being latency limited nor bandwidth limited, for the simple reason that it typically uses no IPC at all.

                                                    • Re: Raspberry Pi server clusters
                                                      freads

                                                      Hey, everyone knows that Rpi isn't as powerful as conventional supercomputer, but there are some applications which can make use of this low computing power, things such as swarm robotics, 3d printer etc can take this job pretty well, instead of writing linear code you write parallel code, weather simulation is a big deal with Rpi, leave small miniaturized supercomputers built for stuff like biological analysis like Little Fe http://littlefe.net/ .Check the link. And also the LMU project done by scientist is just a demonstration that anything can be done with ARM.

                                                      I still need to do more research to come to a conclusion. Guess this will be my final year project with some parallel code running on it.

                                            • Re: Raspberry Pi server clusters
                                              rrmcbeth

                                              OK, lets construct an 8 board Pi Cluster.  On each Pi you install a 90 deg I/O port header.    Then design a interconnect board that these plug into.  The Pi's mount vertical to the horizontal interconnect board.  This board distributes  power/ground to the Pi's and also connects all the I/O ports together for what ever we want to use them for.  Such as broadcast communication.   Some I/O ports could drive  status LEDs for each board.

                                               

                                              Each Ethernet port is connected to an 8 port Ethernet switch/router for communication.  S/D cards are available from the rear and the USB's are available from the front.   The 8th board connects to a video terminal and  keyboard/mouse/printer via a USB hub.    Video and keyboard/mouse is available from each board for trouble shooting.

                                               

                                              Each Pi is separated by about 1" so the whole thing; Pi's, interconnect board, power supply and Ethernet switch fits inside a shoebox.   

                                               

                                               

                                                • Re: Raspberry Pi server clusters
                                                  morgaine

                                                  Physically, I think I'd start by designing a pluggable module enclosure for the Pi, just to make life with large N less of a bother.  A good quality rear-facing connector will have to be chosen for the module, perhaps something from the DIN41612 line.  Then N identical inner module looms can be assembled, an extremely boring operation but once done, you'll never again have to touch the wiring.

                                                   

                                                  I'd place the design on Thingiverse, naturally, and 3D print it on my own Shapercube if it's operational by then, or on a friend's printer if it's not.  (Or, your local hackerspace will be delighted to print it for you!)  RepRap-type 3D printing is ideal for this, since smooth surface finish is not required and you can iterate repeatedly at home and at very low cost until you have exactly what you want.

                                                   

                                                  It is possible that the pluggable module will have to be substantially bigger than the Pi,  because the Pi's connectors come out in all directions and the module will have to accomodate the chosen connectors within its form factor.  Which Pi connections are brought out to the rear module connector needs to be decided based on two factors, first the core signaling issues (only those connections tolerant of poor impedance matching and significant crosstalk are likely to work well through a universal connector), and second the goals of the cluster builder.

                                                   

                                                  A spectrum of designs is likely to emerge based on the goals for which a cluster is built.  I suspect that the most popular type of cluster module will need only power and Ethernet to be routed to each board, although of course every self-respecting geek will also require front-panel LEDs to be routed to the GPIO lines.

                                                   

                                                  It should be mentioned that  we're discussing clustering Pi boards here because this is a Pi-oriented group, but the board's physical layout is poorly suited for building clusters, even for those applications where the board's capabilities are sufficient.  A better candidate for building clusters would be a board designed as a pluggable module from the start, such as Rhombus Tech's EOMA68 form factor module -- http://rhombus-tech.net/ .  Based on the very successful Allwinner A10 CPU (an ARM Cortex-A8) -- http://rhombus-tech.net/allwinner_a10/ -- and with an estimated BOM cost of $15, this would obviously be easier to employ as a cluster node directly.  Unfortunately it doesn't exist yet, so its better features are quite academic.   Keep an eye open on progress there though.

                                                   

                                                  For now, we'll just have to work with the Rpi's connections coming out at all angles, as no other Linux board is available in the same price bracket.  If all you need to connect is power and Ethernet, it's very easily manageable anyway.

                                                   

                                                  Morgaine.

                                                • Re: Raspberry Pi server clusters
                                                  bobmendon

                                                  That fits in well with my interest in introducing low-power sustainable computers into the network environment. The only difference is that I am working on plans for a low-power server using all SSDs. After I lauch my desktop which is still in beta sometime in the next few months, the server is next along with some investigation of mini-clustering. It would be interesting to compare stats later on in my project in terms of energy foot print and performance. Right now I'm working in the mini-itx format and looking at the mini (3.5") format for a follow-on. Another area to explore clustering in is VmWare envionments and even setting up mini SANs.

                                                    • Re: Raspberry Pi server clusters
                                                      morgaine

                                                      > "introducing low-power sustainable computers into the network environment"

                                                       

                                                      I like your phrase a lot, Bob.

                                                        • Re: Raspberry Pi server clusters
                                                          bobmendon

                                                          I'm pretty proud of the "sustainable" part considering I built a fully functional desktop PC from off-the-shelf parts. It has no moving parts other than the on/off switch, meets or exceeds all international standards for contect of toxins and green efficency, can be constructed with nothing more than a screw driver and only uses 33 watts of power at peak operation. The PC is 95% recyclable. I am looking forward to boards like RPi coming up in performance enough to drive a SSD and handle more memory which would allow it to drive more resource intensive OS's...(read Windows). As it is, RPi looks like a pretty good platform for running a VmWare image. I hope to experiment with that.

                                                            • Re: Raspberry Pi server clusters
                                                              morgaine

                                                              The "sustainable" word was one of the things that attracted me to RepRap 3D printing, because the project founder Adrian Bowyer mentioned that old extruded items could (in principle) be melted down and recycled into plastic filament ready for your next print.  He gave a very alluring example of when your child outgrows her sandals, you melt down the plastic, add a bit more filament, and print out a slightly larger pair ... with the added punch line that PLA polymer is biodegradable and can be made out of corn that you've grow in your back yard. 

                                                               

                                                              Of course, reality is a lot different to principle, and extraction of pure ABS or PLA from old sandals would be difficult, not to mention that extruding filament to the required fine tolerance is not something that you can do at home (yet).  And I'm not into farming corn.  Still, the thought is there.

                                                               

                                                              The real reason I'm mentioning 3D printing though is in answer to your building low-power sustainable computers, because 3D printing makes it so easy to build enclosures that evolve with your requirements.  My printer isn't completed yet, but once it is, it'll be near the top of my list of applications to create pluggable modules for various things, including microcontrollers and Raspberry Pi cluster nodes, as I discussed here earlier.

                                                               

                                                              The low power consumption of ARM and the ability to construct plastic enclosures to suit your specific needs seem to go together rather well, and provides ample opportunity to reuse parts from obsoleted or failed equipment.

                                                               

                                                              Morgaine.

                                                        • Re: Raspberry Pi server clusters
                                                          morgaine

                                                          Hooray,  objective measurements are starting to appear as more people receive their Pi boards:

                                                           

                                                           

                                                          It's very important to match what the board can offer against the specific requirements of your application, otherwise it can lead to disappointment.  Also, it's crucial not to make a one-dimensional analysis when multiple factors come into play, as cand's article highlights --->  the Pi's Ethernet performance can approach the full bandwidth of the line, but only at a huge CPU cost.

                                                           

                                                          First indications then suggest that networked applications of Pi are likely to be most successful when the network use is occasional and when not much else has to be executed concurrently with the communications.

                                                           

                                                          This will need a lot more careful analysis so that we know exactly where the bottlenecks are.  Some of them are quite likely to be remedied by kernel config improvements or with better drivers, but one has to know the detailed cause of a problem before one can tackle it.

                                                           

                                                          Morgaine.

                                                            • Re: Raspberry Pi server clusters
                                                              morgaine

                                                              The following is about HPC which isn't my focus, but interesting nevertheless.  From Slashdot today:

                                                               

                                                              "Phoronix constructed a low-cost, low-power 12-core ARM cluster running Ubuntu 12.04 LTS and made out of six PandaBoard ES OMAP4460 dual-core ARMv7 Cortex A9 chips. Their results show the ARM hardware is able to outperform Intel Atom and AMD Fusion processors in performance-per-Watt, except it sharply loses out to the latest-generation Intel Ivy Bridge processors."

                                                               

                                                              More at:  http://linux.slashdot.org/story/12/06/16/1510257/12-core-arm-cluster-beats-intel-atom-amd-fusion

                                                               

                                                              Morgaine.

                                                                • Re: Raspberry Pi server clusters
                                                                  Colin Barnard

                                                                  Probably not quite what you're thinking of but the Design Spark forum (the competition) has an article on someone using Pis to make a VAX cluster.

                                                                   

                                                                  Colin

                                                                      • Re: Raspberry Pi server clusters

                                                                        There is an interesting article benchmarking a 12-core ARMv7 cluster

                                                                        against some PC's.

                                                                         

                                                                        http://www.phoronix.com/scan.php?page=article&item=phoronix_effimass_cluster&num=1

                                                                          • Re: Raspberry Pi server clusters
                                                                            morgaine

                                                                            Yeah, saw that.  It's very underwhelming though.  I'm a bit surprised that ARM (the company) isn't working harder in that area and dedicating more research to multicore clustering.  A reference board from them bearing say 16 dual-core Cortex-A9 MPcore chips would really fan the flames wonderfully.

                                                                             

                                                                            Intel can still warrant its total complacency about server-side ARM because a modern Atom core is nearly an order of magnitude faster than the best ARM core, so it doesn't have to push very strongly.  Well ARM can't do anything about the speed difference yet, but because it has the power advantage over Atom it really ought to use that to promote high core counts in clusters and start closing the gap with sheer core numbers.  It's not doing that, and I'm not too sure why not.

                                                                             

                                                                            Morgaine.

                                                                              • Re: Raspberry Pi server clusters

                                                                                There must be better ARM chips in the pipeline. At least Dell & HP are said to have arm based servers in the works and I can't see them being the only ones. Better performance per watt for arm only goes so far. I can see it being a tough sell if you need 10x the number of servers to meet some performance point.

                                                                                • Re: Raspberry Pi server clusters
                                                                                  John Beetem

                                                                                  Morgaine Dinova wrote:

                                                                                   

                                                                                  I'm a bit surprised that ARM (the company) isn't working harder in that area and dedicating more research to multicore clustering.  A reference board from them bearing say 16 dual-core Cortex-A9 MPcore chips would really fan the flames wonderfully.

                                                                                  I would hazard a guess that ARM feels that the licensees are doing a fine job on their own, e.g.,

                                                                                  Calxeda EnergyCore:

                                                                                  Calxeda's so-called EnergyCore will come in versions with two and four Cortex A9 cores running at 1.1 to 1.4 GHz and sharing 4 Mbytes L2 cache. The chip includes an 80 Gbit/s fabric switch capable of supporting 4,096 nodes.

                                                                                   

                                                                                  The EnergyCore supports up to five 10 Gbit/s ports. The SoC includes up to three 10 Gbit/s Ethernet MACs, four PCI Express Gen 2 links and five 3 Gbit/s serial ATA interfaces. It also includes an ARM M3 core to run server management software and supervise power management tasks.

                                                                                   

                                                                                  The Calxeda device delivers about two-thirds to two-fifths the performance of a Westmere-class Intel Xeon 5620 four-core server processor, depending on the targeted application, said Karl Freund, vice president of marketing for Calxeda.

                                                                                  The article includes a photo of a nice-looking four-chip server card.

                                                                                   

                                                                                  In other news from last year, AMCC is working on a 64-bit ARMv8 server chip: AMCC demos 64-bit ARM server chip.

                                                                                    • Re: Raspberry Pi server clusters
                                                                                      morgaine

                                                                                      That Calxeda chip looks nice, John.  But I still think that ARM should prime the pump with a reference design, it's not enough to leave it to a few specialist licensees.  In addition to raising the level of activity in ARM clustering, it would establish some sort of standard for licensees to follow.  Without that you can more or less guarantee that every specialist's clustering technology will be different --- a nightmare for everyone, including for Linux.

                                                                                       

                                                                                      Morgaine.

                                                                                        • Re: Raspberry Pi server clusters
                                                                                          John Beetem

                                                                                          Morgaine Dinova wrote:

                                                                                           

                                                                                          That Calxeda chip looks nice...

                                                                                          More news about the Calxeda chip:

                                                                                          ARM Server Ships From Boston Limited (pcworld.com):

                                                                                          Boston Limited on Monday said it was manufacturing and distributing a low-power server with ARM-based chips, becoming one of the few companies to make such a server commercially available.

                                                                                          The Viridis server has the EnergyCore chip from Calxeda...

                                                                                            • Re: Raspberry Pi server clusters
                                                                                              morgaine

                                                                                              pcworld.com wrote:

                                                                                               

                                                                                               

                                                                                              The Boston Viridis server has up to 48 Calxeda chips -- 192 ARM cores in a 2U enclosure -- with integrated networking and storage units. Each Calxeda chip consumes as little as 5 watts per chip, U.K.-based Boston said in a statement.

                                                                                               

                                                                                              Interesting, for the server farm, but I'd be happy with a much more down-to-earth 16 ARM cores of Cortex-A vintage in a 1U form factor, with gigabit Ethernet and SATA.  Pretty low spec really.

                                                                                                • Re: Raspberry Pi server clusters

                                                                                                  I'd be happy with a much more down-to-earth

                                                                                                  It appears that at least the first generations are going to target datacenters, blades, and windows.

                                                                                                  Microsoft mandating that 'secure boot' can't be disabled on Arm servers has already been widely reported elsewhere and could make it much more difficult to use alternative operating systems in the future.

                                                                                                  Thankfully that doesn't appear to be the case with the Boston server as reported, hopefully they continue to produce systems along those lines.

                                                                                                   

                                                                                                  Looking at it, there appears to be 12 processor cards in that system. Presuming it's modular enough, a single card would give you 16 cores and could be integrated into a smaller chassis ?

                                                                                                    • Re: Raspberry Pi server clusters
                                                                                                      morgaine

                                                                                                      selsinork wrote:

                                                                                                       

                                                                                                      Looking at it, there appears to be 12 processor cards in that system. Presuming it's modular enough, a single card would give you 16 cores and could be integrated into a smaller chassis ?

                                                                                                       

                                                                                                      In principle, perhaps.  But in practice a company that is targetting data centers isn't likely to be interested in the other end of the market.  The breakthrough, if it happens, will come from an ARM licensee that isn't afraid to sell ARM application processors for $5 in volume, like TI did with the AM3358/9.

                                                                                • Re: Raspberry Pi server clusters
                                                                                  NeilM

                                                                                  Seems like it's already a cluster. Just a different kind of cluster from the one you mean! Reminds me of Inigo Montoya!

                                                                                  • Re: Raspberry Pi server clusters
                                                                                    morgaine

                                                                                    More news in this area:

                                                                                     

                                                                                     

                                                                                    Summary quotes:

                                                                                     

                                                                                    • "Each Slab CPU node consists of a Marvell quad-core 1.33-GHz Armada XP ARM chip, 2 GB of ECC RAM, a Cogent Computer Systems CSB1726 SoM, and a 30 GB solid-state drive"
                                                                                    • "32 cores into a half-depth 1U server".

                                                                                     

                                                                                    I'm really surprised that there is so little of this kind of development going on.  It can barely be called a sector yet, few players and only background activity.  Is there even one such product on open sale from a large distributor?

                                                                                     

                                                                                    Morgaine.

                                                                                      • Re: Raspberry Pi server clusters

                                                                                        >I'm really surprised that there is so little of this kind of development going on.

                                                                                         

                                                                                        I don't think the mips/$ or mips/watt work out too well yet compared to the i7.

                                                                                         

                                                                                        In the rpi.org front page blog of 24 July 2012,

                                                                                        http://www.raspberrypi.org/archives/1655

                                                                                         

                                                                                        Pete Stevens compares the RPi to the Mac Mini for web hosting:

                                                                                         

                                                                                        "Is this sensible? We’ve had a few customers ask us if the Raspberry Pi would be a sensible device for hosting on as it’s very cheap and very low power. Unfortunately it’s also very slow for this kind of application and the supporting hardware is very bulky. The i7 quad core Mac Mini occupies less space than the Pi + hub + disk + PSU, uses about fives times as much electricity, costs about five times as much once you include the supporting hardware but is hundreds of times faster. So revolutionising the hosting industry isn’t going to happen with the Raspberry Pi, at least not until they build a PoE one with gigabit ethernet and more RAM."

                                                                                          • Re: Raspberry Pi server clusters
                                                                                            morgaine

                                                                                            That rpi.org quote is about the Pi though, a very old ARM machine which was never designed for CPU performance nor network throughput, nor even power efficiency for networking loads.  It can't be used as any kind of reference in the ARM server appliances field.

                                                                                             

                                                                                            ARM plugtops have for some time now had gigabit networking and integrated disk controllers.  There's no huge reason why the same level of ARM technology couldn't be reworked into a 1U network appliance featuring a few of these SoCs interconnected by nothing more than a gigabit switch, for only a few hundred pounds BOM cost.  And higher models based on quad-core Cortex-A9 MPcore or Cortex-A15 too.

                                                                                             

                                                                                            I bet there would be plenty of takers ... starting with me!

                                                                                             

                                                                                            Morgaine.

                                                                                              • Re: Raspberry Pi server clusters

                                                                                                the RPF is rumored to be planning to release benchmark results soon,

                                                                                                showing the RPi doing quite well against more modern ARM cpus.

                                                                                                Of course, comparing a single-core cpu to a multi-core is difficult.

                                                                                                 

                                                                                                The Cortex-A15 is yet to be seen, I believe.

                                                                                                  • Re: Raspberry Pi server clusters
                                                                                                    John Beetem

                                                                                                    coder27 wrote:

                                                                                                     

                                                                                                    the RPF is rumored to be planning to release benchmark results soon,

                                                                                                    showing the RPi doing quite well against more modern ARM cpus.

                                                                                                    Of course, comparing a single-core cpu to a multi-core is difficult.

                                                                                                     

                                                                                                    The Cortex-A15 is yet to be seen, I believe.

                                                                                                    I would like to see independent benchmark results, so there's no conflict of interest.  But there's a general problem with benchmarks: how well is the benchmark going to predict any application other than the bechmark?  It's YMMV on steroids.

                                                                                            • Re: Raspberry Pi server clusters
                                                                                              morgaine

                                                                                              Just linking here for reference a list of basic design notes I made in another thread, as it's relevant to ARM clusters --- http://www.element14.com/community/message/61225#61225 .

                                                                                               

                                                                                              Designing modules specifically for clustering is different to designing boards for standalone operation, as the operational requirements are really quite different.

                                                                                               

                                                                                              What's more, the cluster design rules also apply if you're designing a standalone computer which happens to use a clustering architecture internally.  So, if you're making a standalone board containing multiple ARM processors, the design rules for clusters should be applied if you want an effective architecture, not the rules that you would use for a single-SoC board.

                                                                                              • Re: Raspberry Pi server clusters
                                                                                                John Beetem

                                                                                                Just saw this at Geek Times: ARM, LSI on-chip link connects up to 32 cores

                                                                                                 

                                                                                                LSI Corp. and Calxeda Inc. are the first chip companies to license from ARM Holding plc a new on-chip interconnect developed for linking up to 32 cores on a die.  ARM's CoreLink CCN-504, delivering throughput in the range of 50-100 Gbits/s, will debut in LSI’s first ARM-based devices to be announced in February.

                                                                                                • Re: Raspberry Pi server clusters
                                                                                                  John Beetem

                                                                                                  AMD is planning to make 64-bit ARMs for servers.

                                                                                                   

                                                                                                  From ZDNet:

                                                                                                   

                                                                                                  AMD has announced that it is teaming up with ARM to develop 64-bit ARM processors for servers to meet growing challenges for data centers. "AMD will transform the computing data center environment today," said AMD CEO and president Rory Read during a press conference on Monday afternoon, asserting that AMD will be the first company to offer both 64-bit ARM and x86 server processors.

                                                                                                    • Re: Raspberry Pi server clusters
                                                                                                      morgaine

                                                                                                      More interesting news in this area:

                                                                                                       

                                                                                                       

                                                                                                      One thing that surprises me is that Intel aren't building up a server market presense based on multiple clustered Atom chips.  Indeed, Atom seems to be almost a stealth product for them, very low key, and that's pretty odd when the future clearly forecasts competition in power/watt from ARM.

                                                                                                       

                                                                                                      Morgaine.

                                                                                                        • Re: Raspberry Pi server clusters
                                                                                                          John Beetem

                                                                                                          Morgaine Dinova wrote:

                                                                                                           

                                                                                                          One thing that surprises me is that Intel aren't building up a server market presense based on multiple clustered Atom chips.  Indeed, Atom seems to be almost a stealth product for them, very low key, and that's pretty odd when the future clearly forecasts competition in [performance]/watt from ARM.

                                                                                                          From my recollection of microprocessor history, Intel has never been into low power.  Intel's technological model is squeezing lots and lots of fast transistors onto a piece of element 14 and they are IMO better at doing this than anyone else.  This has allowed them to be lazy about architecture, since transistor performance has so far been able to win.  But all those fast transistors waste a lot of power and require mechanical cooling, which is one of the big reasons PowerPC has had much more success in industrial and automotive applications.  (ARM partners like TI and Freescale are now going after these applications.)

                                                                                                           

                                                                                                          Another thing is profit margin.  Intel chips have always been really expensive.  Again from my recollection of uP history, I think the original quantity 1 price of the Intel 8080 was US$300 (not adjusted for inflation).  Yes, you could get an 8008 and its support chips for less, but the 8080 had much better performance and instruction set -- and it got a lot cheaper really fast.  However, whenever a new Intel uP comes out, it's generally in the same price range.  Compare to RasPi's US$5 SoC.

                                                                                                           

                                                                                                          When a large company with large profit margins is faced with technology that can dramatically improve price/performance, they often retrench and sabotage internal efforts to take advantage of new technology.  Basically, they don't want internal products with improved price/performance to compete with the old ways of doing things which have enjoyed high profit margins.  Too often the large company stalls internal efforts so long that the company isn't ever able to recover.

                                                                                                           

                                                                                                          My favorite example of this phenomenon is the IBM PCjr, which came out in 1984 -- the same year as the original Macintosh.  The PCjr could have taken advantage of newer Intel SoCs such as the 80186 and produced much better performance than the 1981 IBM PC, at much lower cost.  However, IBM didn't want the lower profit margin PCjr to take business away from the older PC, so they made sure the PCjr didn't compete by sabotaging its performance and giving it a toy keyboard.  Well, PCjr didn't compete with the PC -- or with anything else.

                                                                                                           

                                                                                                          JMO/YMMV

                                                                                                            • Re: Raspberry Pi server clusters
                                                                                                              morgaine

                                                                                                              Hmmm, dunno.  Deliberate internal sabotage of developments that could lead to better technology at lower profit margins is one possible explanation, but it's singularly short-sighted when there is a possible external competitor coming over the horizon.  Assuming that Intel does do forward planning, the likelihood seems low to me.

                                                                                                               

                                                                                                              It also doesn't seem very likely  for a second reason:  Atom exists, and works very well.  I don't know what the current state of play is, but a couple of years ago it was winning head-to-head reviews against all comers on performance per watt.  Nobody (sane) sabotages a winning product, surely?

                                                                                                               

                                                                                                              That said, the performance of ARM has improved massively in the last few years, so perhaps the situation has changed, not in Intel's favour.

                                                                                                              • Re: Raspberry Pi server clusters
                                                                                                                morgaine

                                                                                                                The CEO of ARM, Warren East, says in an interview at http://www.technologyreview.com/news/507116/moores-law-is-becoming-irrelevant/ :

                                                                                                                 

                                                                                                                "To me a PC is really just a smartphone in another form factor. [cut]  TVs are the same.

                                                                                                                TVs are big smartphones. Computers are kind of medium smartphones."

                                                                                                                 

                                                                                                                I quote it mainly because it made me chuckle, and although it's to be expected that the ARM CEO would say such things, there's quite a lot of truth in it as well.  Computers are intrinsically the same, whatever the niche.  And as he says later in the interview, ARM certainly wasn't designed expressly for smartphones.

                                                                                                                 

                                                                                                                I just wish ARM would do something a little more explicit in the direction that their heads regularly speak about.  Without cluster interconnect becoming available as an optional but integral part of the ARM architecture so that we don't have a Tower of Babel of incompatible interconnects, ARM-based servers will have a hard time becoming ubiquitous.

                                                                                                                 

                                                                                                                Morgaine.

                                                                                                                  • Re: Raspberry Pi server clusters
                                                                                                                    7point62

                                                                                                                    Morgaine Dinova wrote:

                                                                                                                     

                                                                                                                    The CEO of ARM, Warren East, says in an interview at http://www.technologyreview.com/news/507116/moores-law-is-becoming-irrelevant/ :

                                                                                                                     

                                                                                                                    "To me a PC is really just a smartphone in another form factor. [cut]  TVs are the same.

                                                                                                                    TVs are big smartphones. Computers are kind of medium smartphones."

                                                                                                                     

                                                                                                                    I quote it mainly because it made me chuckle, and although it's to be expected that the ARM CEO would say such things, there's quite a lot of truth in it as well.  Computers are intrinsically the same, whatever the niche.  And as he says later in the interview, ARM certainly wasn't designed expressly for smartphones.

                                                                                                                     

                                                                                                                    I just wish ARM would do something a little more explicit in the direction that their heads regularly speak about.  Without cluster interconnect becoming available as an optional but integral part of the ARM architecture so that we don't have a Tower of Babel of incompatible interconnects, ARM-based servers will have a hard time becoming ubiquitous.

                                                                                                                     

                                                                                                                    Morgaine.

                                                                                                                     

                                                                                                                    I hope that our Warren has his tongue embedded firmly in his cheek, or perhaps he's only concerned with his particular corner of the hardware world. Computet = smartphone = telly? Hmmm... perhaps in consumerland where it's only real tasks are to give access to media, "rich web content" (whatever that is), adverts, spam, oline shopping, more spam and then to become obsolete just in time for next gen. tech then maybe so. But, for folks like me who only really tolerate computers because they are good at doing hard sums very quickly then I fear he's talking cobblers.

                                                                                                                     

                                                                                                                    If ARM is to become ubiquitous then it will have to offer a bit more than low power (in terms of Watts and flops) at bargain bucket prices. It's a bit of a chicken and egg scenario, where potential adopters don't bite unless they are confident about format longevity and future legacy support (a non-consideration with consumer devices, but essential in industry). Industrial software types may similarly balk at turning out high value, low volume product for a platform that's "not quite done yet" - especially as not all ARM hardware is created equal... The chip makers themselves probably aren't going to toss in features that are currently seen as niche in the hopes of attracting a few customers when consumer grade whatnot and low cost high volume embedded applications are ticking along quite nicely. Oh, I forgot the need for a fit-and-forget operating system that software and hardware manufacturers will have enough confidence in to universally support.

                                                                                                                     

                                                                                                                    The trick will be to nudge things over that R0>1 tipping point, but there are a bazillion little details (and one big roadmap) to finalise first.

                                                                                                                      • Re: Raspberry Pi server clusters
                                                                                                                        morgaine

                                                                                                                        Jonathan Garrish wrote:

                                                                                                                         

                                                                                                                        The trick will be to nudge things over that R0>1 tipping point, but there are a bazillion little details (and one big roadmap) to finalise first.

                                                                                                                         

                                                                                                                        I'm glad you pointed out the little issue of roadmap, because it's a very important issue in industry despite having no importance whatsoever in the consumer gadgets sector.  Industrial and commercial players need to know that the ARM-based server that they'll buy tomorrow from Dell and others is going to have an evolutionary path for many years ahead before they'll start investing in non-x86 software.  Currently there is no indication of a concrete roadmap in that area from ARM whatsoever, AFAIK.

                                                                                                                         

                                                                                                                        ARM likes dropping vague hints about servers and about ARM licensees delivering the goods through competition in the market, but very oddly they totally fail to realize that they have a crucial role to play in establishing the foundations upon which a server sector will be based.  There's a lot more to it than merely defining an ISA and telling licensees to get on with it.  That doesn't inspire confidence among prospective buyers at all.

                                                                                                                         

                                                                                                                        To bring the aggregate performance of a server based on low-power ARM chips up to that of a modern Intel/AMD server requires a lot of cores, and ARM can't use an SMP architecture for this like Intel and AMD are currently doing.  Shared memory has extremely limited scalability, and a lot of cores would rapidly hit the ceiling even with fancy multi-level caching architectures (which introduce their own problems anyway, lots of them).

                                                                                                                         

                                                                                                                        The scalable way for ARM to go is with a clustering approach instead, using on-chip interconnect hardware for parallel communication between cores on the same chip or on the same board without distinction.  This would allow server boards to scale to an arbitrary number of cores both on-chip and on the server motherboard.  It's not rocket science either, as the transputer pioneered that architecture back in the 80's.

                                                                                                                         

                                                                                                                        But for that to happen, ARM needs to make the interconnect a standard feature that ARM licensees can add to their ARM SoCs, a standard feature supported by standard instructions so that we don't end up with the Tower of Babel I mentioned above.  And I don't see ARM doing anything like that yet.

                                                                                                                         

                                                                                                                        Morgaine.

                                                                                                                          • Re: Raspberry Pi server clusters
                                                                                                                            morgaine

                                                                                                                            It's worth adding that such on-chip interconnect hardware would have tremendous impact far beyond the limited area of ARM server communications.  Just imagine the possibilities if your Cortex-A application processors could talk to your Cortex-M microcontrollers at gigabit rates on separate links instead of crawling along at SPI or I2C speeds on shared buses.  Suddenly a whole new class of applications becomes possible.

                                                                                                                              • Re: Raspberry Pi server clusters
                                                                                                                                michaelkellett

                                                                                                                                If the Transputer architecture was so great how is it that there are no transputers now?

                                                                                                                                The ideas live on in XMOS and while they aren't bust they are only achieving niche sucesss on a a very small scale.

                                                                                                                                 

                                                                                                                                The reality is that that parallel at the core level is far from sorted - it isn't rocket sicence (we know how to make rockets).

                                                                                                                                 

                                                                                                                                There are lots of core level experimental parallel schemes afoot, GPUs, Greenchip, XMOS, Propeller etc -- none of them seem to be that compelling (except perhaps GPUs).

                                                                                                                                 

                                                                                                                                So - since you pose the question - what are the possibilities of your Cortex A linked to Cortex M that you can't do right now with the Xilinx Zynq ?

                                                                                                                                 

                                                                                                                                Michael Kellett

                                                                                                                                  • Re: Raspberry Pi server clusters
                                                                                                                                    John Beetem

                                                                                                                                    Michael Kellett wrote:

                                                                                                                                     

                                                                                                                                    There are lots of core level experimental parallel schemes afoot, GPUs, Greenchip, XMOS, Propeller etc -- none of them seem to be that compelling (except perhaps GPUs).

                                                                                                                                     

                                                                                                                                    So - since you pose the question - what are the possibilities of your Cortex A linked to Cortex M that you can't do right now with the Xilinx Zynq ?

                                                                                                                                    There was a very good article in IEEE Spectrum last year by Peter Kogge on Next-Generation Supercomputers.  I found this to be the most interesting take-away:

                                                                                                                                    The good news is that over the next decade, engineers should be able to get the energy requirements of a flop down to about 5 to 10 pJ.  The bad news is that even if we do that, it won't really help.  The reason is that the energy to perform an arithmetic operation is trivial in comparison with the energy needed to shuffle the data around, from one chip to another, from one board to another, and even from rack to rack.

                                                                                                                                    This has been my experience as well: building high-throughput processing engines is easy.  The difficult part is getting operands to them so they can do useful work.  This is why DSPs have specialized high-speed multi-port memories, but they're small and only work for small data blocks that get reprocessed many times.  A GPU that acts as a SIMD pipeline is also very effective for some applications.  But you can't expect a general application to get much sustained performance without a lot of work to make it fit the parallelism of the hardware.  It's easy to get peak performance, defined by a wag as "a guarantee from the manufacturer that you won't go faster than this".

                                                                                                                                     

                                                                                                                                    IMO the obvious solution is to design the parallel processor's architecture to match the parallelism of the application.  If the application is a good match to GPUs, use GPUs.  If it's a good match to FPGAs and their huge amount of processing (provided that you can get operands to the processing elements), use FPGAs.  However, there's a big non-technical problem: GPU and FPGA vendors won't give you direct access to their architectures, so work in using these incredibly powerful engines for parallel processing is advancing very slowly.  It's much easier just to network up a bunch of high-end x86 CPUs and pay the electric bill.

                                                                                                                                     

                                                                                                                                    We could do a hell of a lot with a Xilinx Zynq -- if we could program the logic array directly.

                                                                                                                                    • Re: Raspberry Pi server clusters
                                                                                                                                      morgaine

                                                                                                                                      Michael Kellett wrote:

                                                                                                                                       

                                                                                                                                      If the Transputer architecture was so great how is it that there are no transputers now?

                                                                                                                                       

                                                                                                                                      The transputer didn't take off back in the 80's simply because it was way ahead of its time, and there was not yet any need for a solution to the problem that it solved.  The evolution of the single core microprocessor still had decades of opportunity ahead of it.  Increasing the clock speed of a CPU was a comparatively trivial method of increasing its performance, so single CPUs vanquished all other contenders in the industry.  No mystery there.

                                                                                                                                       

                                                                                                                                      The reality is that that parallel at the core level is far from sorted - it isn't rocket sicence (we know how to make rockets).

                                                                                                                                       

                                                                                                                                      That's certainly true.  I did my PhD in that very topic, parallelism and concurrency, so I had a first-hand opportunity to experience the many problems in that domain, as well as to examine the very wide spectrum of candidate solutions that people have conjured up to deal with it.  There is no shortage of solutions, but you're right that it's "far from sorted" in one particular sense --- although many candidates work just fine, no particular solution has been embraced by the world at large.  In part this is a consequence of non-SMP multicore hardware simply not being widely available yet, outside of GPUs.

                                                                                                                                       

                                                                                                                                      The bigger problem though is not with technology, but with people.  More specifically, it is a problem with people who are so attached to their beloved language that runs well only on a single core that they are not willing to face the fact that languages are just engineering tools, and you need to pick the right tool for the job or you're banging in screws with a hammer.  That message fails to get through, probably because the majority of the world's programmers are not engineers at heart but language craftsmen.  The message is unwelcome and is rejected.

                                                                                                                                       

                                                                                                                                      There are lots of core level experimental parallel schemes afoot, GPUs, Greenchip, XMOS, Propeller etc -- none of them seem to be that compelling (except perhaps GPUs).

                                                                                                                                       

                                                                                                                                      GPUs offer one solution to parallelizing computation through schemes such as OpenCL, and it's certainly nice to see that concept gaining traction.  The GPU manufacturers realize and implement what the CPU manufacturers are mostly failing to embrace, that the future of computing is to employ thousands or million or billions of cores, and in consequence your programming methodology has to change.  Actually, I bet that the CPU manufacturers do realize it, but perhaps don't know how to get past the problem of programmer inertia that I mentioned above.

                                                                                                                                       

                                                                                                                                      XMOS may be the start of something interesting, but I think their chances of survival as a minnow in a shark-infested sea are minimal.  If they get bought out by a large player eager to take on Intel and AMD then things could get very entertaining.  ARM is certainly aware of them through one of their founders, so who knows what the future holds.  Perhaps if ARM had a roadmap ... :-)

                                                                                                                                       

                                                                                                                                      Propeller is just an eclectic approach to parallelising embedded microcontrollers, and doesn't pretend to be anything else.  Although it's cute and quite effective in its domain, it doesn't offer anything for general purpose computing.

                                                                                                                                       

                                                                                                                                      So - since you pose the question - what are the possibilities of your Cortex A linked to Cortex M that you can't do right now with the Xilinx Zynq ?

                                                                                                                                       

                                                                                                                                      I'm guessing now, but if you meant "Can't the Zynq's FPGA be configured to provide the hardware interconnect?" then the answer is "Yes, but only poorly".  Asynchronous serial communication at very high data rates is a task best done by dedicated silicon, and it needs to be supported by the ISA to be most effective.  The transputer got that right too, among so many other things.

                                                                                                                                       

                                                                                                                                      But the worst part of doing the interconnect in a Zynq's FPGA is that no other ARM licensee will have the technology.  Clearly that is no way to create a multi-provider ecosystem.  ARM has to define the needed foundations, ie. standard links and a standard ISA to use them.

                                                                                                                                       

                                                                                                                                      Morgaine.

                                                                                                                                        • Re: Raspberry Pi server clusters
                                                                                                                                          morgaine

                                                                                                                                          It's possible that those who weren't in the field at the time of the transputer might not picture the interconnect architecture being proposed above, so here's a very brief summary.

                                                                                                                                           

                                                                                                                                          Each CPU core has access to either 4 or 6 point-to-point bidirectional self-synchronizing full-duplex serial links, 4 to construct simple 2D sheets and 6 to construct simple 3D volumes (many topologies are of course possible with this number of links, but simple is best as a starting point).  These links are optimized for speed of message transfer from the core at one end of the link to the core at the other end, and work identically from a software perspective regardless of whether the messaging is between two cores on a single piece of silicon or two cores on different chips.

                                                                                                                                           

                                                                                                                                          Message transfers are handled by scatter-gather DMA controllers, and a core is not involved at all during such transfers in or out of its private memory space.  If it requires notification of completion of a transfer in either direction then the interrupt system takes care of it in the normal manner, but this isn't necessary for pure transit messages (those that are just passing through the node).  A transit message destined for a different node is automatically passed from the DMA controller on the incoming link to the DMA controller on the outgoing link, and the message just squirts through the node without bothering the CPU.  (Implicit in this is that messages carry either destination node numbers or link routing descriptors.)

                                                                                                                                           

                                                                                                                                          That's the essence of it.  The focus is very much on simplicity and speed in this interconnect, as well as standard functionality.  The single-minded goal is to get data from the private address space of one core to that of another core, quickly, in a system with an arbitrary number of cores.  Everything else is secondary.

                                                                                                                                           

                                                                                                                                          Morgaine.

                                                                                                                                    • Re: Raspberry Pi server clusters
                                                                                                                                      Roger Wolff

                                                                                                                                      Morgaine Dinova wrote:

                                                                                                                                      I'm glad you pointed out the little issue of roadmap, because it's a very important issue in industry despite having no importance whatsoever in the consumer gadgets sector.  Industrial and commercial players need to know that the ARM-based server that they'll buy tomorrow from Dell and others is going to have an evolutionary path for many years ahead before they'll start investing in non-x86 software.  Currently there is no indication of a concrete roadmap in that area from ARM whatsoever, AFAIK.

                                                                                                                                      One of the things is that for Intel the server market is an "evolution" of their existing market share. So they can plan ahead and have new processors for the server market in the pipeline.

                                                                                                                                       

                                                                                                                                      ARM however, doesn't have a foothold in the server market. If their server-chip-experiment fails, they will end up with a lot of money down the drain, and they'll have to struggle to survive.

                                                                                                                                       

                                                                                                                                      In that case, they won't continue throwing money at the dead project. So I understand that they cannot plan beyond their first server-chips.

                                                                                                                                       

                                                                                                                                      The problem is that software is SO VERY important that most likely the architecture switch won't happen.

                                                                                                                                      It has been shown time and time again that the installed-base-software-compatible processor wins. Add (slow) hardware X86 emulation support and suddenly you've got a much bigger chance of succeeding because you provide an upgrade path for those having older software. That's what made AMD64 succeed.

                                                                                                                                       

                                                                                                                                      (When emulating another architecture, having hardware support for the basics helps a lot. We tried emulating x86 on an architecture our group designed back in the late 1980ies, It turns out 90% of the instructions was dealing with the difference in flag-setting of the emulated instructions compared to the native computer. Having that in hardware speeds things up enormously)....

                                                                                                                                        • Re: Raspberry Pi server clusters
                                                                                                                                          John Beetem

                                                                                                                                          Roger Wolff wrote:

                                                                                                                                           

                                                                                                                                          The problem is that software is SO VERY important that most likely the architecture switch won't happen.

                                                                                                                                          It has been shown time and time again that the installed-base-software-compatible processor wins. Add (slow) hardware X86 emulation support and suddenly you've got a much bigger chance of succeeding because you provide an upgrade path for those having older software. That's what made AMD64 succeed.

                                                                                                                                          Pardon me while I fire up my IBM PC XT/370

                                                                                                                                           

                                                                                                                                          Actually, these days ARM-based computing devices way outsell x86 computing devices when you include smart phones and tablets.  When Google's Dual Cortex-A15 Chromebook starts shipping in ernest for US$249 the inverted pendulum will swing even further in ARM's direction.  The x86 will still have its place for people who need higher performance or if software is not available on ARM (such as FPGA design), but most people will do just fine with ARM and ARM will take over those applications just like x86 won out over System/370 due to better price/performance and performance/watt.

                                                                                                                                           

                                                                                                                                          JMO/YMMV

                                                                                                                                  • Re: Raspberry Pi server clusters
                                                                                                                                    morgaine

                                                                                                                                    On our earlier topic of ARM versus Atom, this comparison of a new Cortex-A15 versus an Atom from 2011 is rather eye-opening --- http://www.anandtech.com/show/6422/samsung-chromebook-xe303-review-testing-arms-cortex-a15/ .

                                                                                                                                     

                                                                                                                                    Executive summary:  ARM wins on idle, but consumption is in the same ballpark for both when running flat out.  The performance figures favour ARM in this comparison, although one should bear in mind that the Atom in question was an old one.

                                                                                                                              • Re: Raspberry Pi server clusters
                                                                                                                                jardino

                                                                                                                                Hello!

                                                                                                                                 

                                                                                                                                I've been interested in building a cluster of computers ever since I read "How to Build a Beowulf" many years ago, but had neither time nor resources to do so until now.

                                                                                                                                 

                                                                                                                                The thinking in that book was that as Personal Computers became more powerful, so less powerful ones should drop in price. However, that didn't happen - they just disappeared from the market. So unless you

                                                                                                                                were lucky to find an organisation that was about to dump dozens of PCs, it was difficult to collect enough machines to get started. And then there were considerations about maintainability, power consumption

                                                                                                                                and waste heat removal...

                                                                                                                                 

                                                                                                                                However, now that Raspberry Pi's are readily available from Farnell and others, I've built myself a 4-node Beowulf, following on from the starting instructions given by Professor Cox of Southampton University

                                                                                                                                ( http://www.southampton.ac.uk/~sjc/raspberrypi/ ). I'm now able to compile and run sample programs in C and Fortran, using MPI, and have started to write my own programs.

                                                                                                                                 

                                                                                                                                I aware of the reasons for a RPi Beowulf never being able to be a true "supercomputer", but my objectives at the moment are simply to learn about parallel processing, so performance is not a key issue for me at the moment.

                                                                                                                                 

                                                                                                                                I plan to add an Rpi node at the rate of one or two a month as my budget permits. Soon I'll have to address the hardware engineering issues as my litle Beowulf simply lives in a plastic Tesco storage box at the moment!

                                                                                                                                 

                                                                                                                                I've shared my learning so far on the raspberrypi.org forum at

                                                                                                                                http://www.raspberrypi.org/phpBB3/viewtopic.php?f=41&t=548 , but interest there seems to have died out.

                                                                                                                                 

                                                                                                                                I hope it's not going to do the same here, although the discussion about clusters seems to have moved away lately from RPi's to more exotic architectures.

                                                                                                                                 

                                                                                                                                Alan.   

                                                                                                                                  • Re: Raspberry Pi server clusters

                                                                                                                                    Alan,

                                                                                                                                      What kind of linpack numbers are you getting on your 4-node cluster?

                                                                                                                                    I think the initial interest in RPi clusters died out when people realized

                                                                                                                                    that the CPU was slow, the GPU was inaccessible, the network is high

                                                                                                                                    latency, layered on USB 2.0, network booting is unavailable, and

                                                                                                                                    connectors come out from all sides.

                                                                                                                                      Why aren't you using a quad-core PC for learning mpi?

                                                                                                                                      • Re: Raspberry Pi server clusters
                                                                                                                                        jardino

                                                                                                                                        coder27 wrote:

                                                                                                                                         

                                                                                                                                          What kind of linpack numbers are you getting on your 4-node cluster?

                                                                                                                                         

                                                                                                                                        I've not run linpack yet. My timing tests have been limited to using WallClock and the "time" parmeter on the command line interface.

                                                                                                                                        However, I'll be happy to get some numbers once my Beowulf is up and running again. (I've just cannibalised the master node to make a media player - replacement RPi due this week!)

                                                                                                                                         

                                                                                                                                        I think the initial interest in RPi clusters died out when people realized

                                                                                                                                        that the CPU was slow, the GPU was inaccessible, the network is high

                                                                                                                                        latency, layered on USB 2.0, network booting is unavailable, and

                                                                                                                                        connectors come out from all sides.

                                                                                                                                        Well, as I said, I'm not too concerned about performance at the moment.

                                                                                                                                        Aren't there moves afoot to make the GPU accessible? In any case, I don't want to work at that level. My main interest is in scientific programming. Anything useful that I develop I would want to to be transferable to a "real" supercomputer.

                                                                                                                                        The connectors are not yet an issue, since everything lives in a plastic storage box just now. However, they will need some thought when I have to engineer the system properly.

                                                                                                                                         

                                                                                                                                          Why aren't you using a quad-core PC for learning mpi?

                                                                                                                                         

                                                                                                                                        Well, I don't have such a machine and probably couldn't afford one. However, I have a handful of RPis. Also, I want to get away from Wintel and into Linux.

                                                                                                                                        And I can't cannabalise a core to use as a media centre!

                                                                                                                                         

                                                                                                                                        Alan.

                                                                                                                                          • Re: Raspberry Pi server clusters

                                                                                                                                            > Well, as I said, I'm not too concerned about performance at the moment.

                                                                                                                                             

                                                                                                                                            That's fine.  But you noted that others had lost interest in clustering RPi's,

                                                                                                                                            and I think their initial interest was due in large part to a mistaken assumption

                                                                                                                                            that adding enough nodes would be a cost-effective way to get decent performance.

                                                                                                                                             

                                                                                                                                            > Aren't there moves afoot to make the GPU accessible?

                                                                                                                                             

                                                                                                                                            No, there are not.  This is another widely held, but mistaken assumption.

                                                                                                                                            The GPU instruction set is highly proprietary to Broadcom, and there is no

                                                                                                                                            general-purpose interface, like OpenCL, nor any plans for such an interface.

                                                                                                                                          • Re: Raspberry Pi server clusters
                                                                                                                                            Problemchild

                                                                                                                                            Coder 27, the main reason for doing a RPI cluster is to get your organisation on the front of what ever media you are intending.

                                                                                                                                            For real work the idea sucks ....looks well cool though

                                                                                                                                          • Re: Raspberry Pi server clusters
                                                                                                                                            morgaine

                                                                                                                                            Hi Alan!  Welcome to Element 14.

                                                                                                                                             

                                                                                                                                            It was great to read your experiences with setting up a small cluster of Pi boards.  I read your posts on the RPF forum too, detailing your voyage of discovery.  Your writeup will probably be quite helpful to other people who want to follow the same path.

                                                                                                                                             

                                                                                                                                            Although the Pi is clearly the wrong board to use if one's goal is HPC (even Eben Upton said as much), as long as "This is not for HPC" is clearly implanted in one's skull then the low price of Pi certainly makes cheap experimentation with clusters viable on a shoestring.  What's more, since nodes run headless and connected only by their Ethernet ports, this is an application that can probably run unaffected by the Pi's USB hardware problems.

                                                                                                                                             

                                                                                                                                            I have to take issue with the dear Prof calling his project "Steps to make a Raspberry Pi Supercomputer".  It's not a supercomputer and never can be, it's simply a cluster.  Profs should not lower themselves to riding on hype bandwagons and claiming something which they know full well to be untrue.

                                                                                                                                             

                                                                                                                                            Using such a cluster for non-HPC applications should certainly be interesting though, as we discussed at some length earlier in this thread.  A high availability web server would be one such application, focussing not on performance but on resilience.  The front ends would be easy to set up because they don't need to communicate with each other when serving web pages, but clustering your back-end database would certainly be interesting, and even challenging to do well.

                                                                                                                                             

                                                                                                                                            It should be mentioned that while Pi is not directly useful for HPC, the picture starts to change when more modern ARM SoCs are considered.  In particular, the newly released Freescale i.MX6 SoC such as used on the Wandboard provides not only the extra power of Cortex-A9 cores but also on-SoC gigabit Ethernet and apparently OpenCL as well, so it certainly seems to be a candidate for modest HPC applications.

                                                                                                                                             

                                                                                                                                            I look forward to reading more as your project progresses, especially if you add High Availability features.  It's worth pointing out that much of this material is relevant whichever ARM boards are used in a cluster.

                                                                                                                                             

                                                                                                                                            Keep up the good work!

                                                                                                                                             

                                                                                                                                            Morgaine.

                                                                                                                                              • Re: Raspberry Pi server clusters
                                                                                                                                                Roger Wolff

                                                                                                                                                The Raspberry pi is quite good for high performance computing, but we need someone with a lot of time on his hands and some technical talent to get mad at the VideoCore graphical processor. We have a whole bunch of "sample code" out in the open. Once someone reverse engineers that and writes an assembler/compiler, we'll be able to do gigaflops on the raspberry pi.

                                                                                                                                                 

                                                                                                                                                @coder27: network booting unavailable? I was just thinking about this  yesterday. It should be easy/possible to make a  program that acts as an pxe boot rom. Store it as "kernel.img" on the SD card, and off you go! This one is easier. It's just that someone needs to have this "itch" to make him scratch it. I've been crashing my kernel like tens of times on monday. So I couldn't copy over the newer kernel.img to the running system. Out comes the SD card, into the reader, update kernel, and back to the pi.  But after ten or twenty iterations you find what crashes your kernel (it was the printk, NOT the actual code I added that crashed the kernel. So I was under the impression that it kept crashing even when I removed all the code that did something.... )

                                                                                                                                                  • Re: Raspberry Pi server clusters

                                                                                                                                                    >Once someone reverse engineers that and writes an assembler/compiler

                                                                                                                                                     

                                                                                                                                                    There is a very early attempt here:

                                                                                                                                                    https://github.com/hermanhermitage/videocoreiv/wiki/VideoCore-IV-Programmers-Manual

                                                                                                                                                    • Re: Raspberry Pi server clusters
                                                                                                                                                      morgaine

                                                                                                                                                      Roger Wolff wrote:

                                                                                                                                                       

                                                                                                                                                      The Raspberry pi is quite good for high performance computing, but we need someone with a lot of time on his hands and some technical talent to get mad at the VideoCore graphical processor.

                                                                                                                                                       

                                                                                                                                                      Wrong tense.  Being generous, try "would be" in place of "is".

                                                                                                                                                       

                                                                                                                                                      And even the "would be" is totally hypothetical in practice, since RPF have stated point blank that they are not working on it, and they keep saying that such work would be impossible for anyone outside a well resourced team like Broadcom's.  Even Eben Upton doesn't pretend that HPC is Pi's forte.

                                                                                                                                                       

                                                                                                                                                      And finally, while that hypothetical version of Pi might well process things fast in each VideoCore, a cooperative cluster of Pi nodes would still be limited by its slow Ethernet.  So I don't really think you're betting on a good horse for HPC there.

                                                                                                                                                       

                                                                                                                                                      I've always said that there are several quite viable non-HPC roles for clusters of Pi though, as long as you avoid triggering the USB issues.

                                                                                                                                                    • Re: Raspberry Pi server clusters
                                                                                                                                                      jardino

                                                                                                                                                      Hi Morgaine:

                                                                                                                                                       

                                                                                                                                                      Thanks for the welcome and kind words!

                                                                                                                                                       

                                                                                                                                                      Yes, the "dear Prof" got me started on my Beowulf, then left me high and dry after showing how to run a pre-compiled C program. The learning curve to getting my own programs to compile and run on a 2-node system was quite steep, but fun. He also failed to address (or tell us about) aspects such as program version control on multiple nodes and system management. For instance, it takes me about 20 minutes to create a cloned SD card on my netbook, so re-building a 64-node system using his method would take nearly three 8-hour days! (Personally, I think his announcement was just for publicity purposes. I mean, tiny computers, supercomputers, Lego and a six-year-old child presses lots of hot buttons in the press. I wonder if he will get his cluster to do anything useful.)

                                                                                                                                                       

                                                                                                                                                      I'm still going to focus on scientific / mathematical programming for a while, since that's my main interest. However, I'll delve into High Availability systems at some point.

                                                                                                                                                       

                                                                                                                                                      Regards,

                                                                                                                                                      Alan.