6 Replies Latest reply on Aug 15, 2019 1:23 PM by drozwood90

    ZU+ FSBL fails to access BRAM on "cold" start

    mboechat

      I hope someone can help me figure out why the FSBL is successful only after the board has been running for a while.

      I am developing on the UltrazedEG board and it's carrier (IOCC) -> http://zedboard.org/product/ultrazed-EG. It has a Zync Ultrascale+ XCZU3EG-SFVA625.

      My goal is to test the DDR, so I can't use it as program memory.The program has to run from the SD card, so I can't use the OCM either, since it is reserved by the bootloader.

      I have modified the out-of the box setup by adding a 256KB BRAM at address 0x8000_0000 in vivado 2018.3 (plus a few other irrelevant changes)

      aj_blockDiagram.png

      In Xilinx SDK 2018.3 I have created 4 projects,

      • The Hardware Platform Specification (importing the .HDF file from Vivado after generating Bitstream)
      • The Board Support Package
      • The FSBL based on the template
      • One Hello World program (to keep this question simple, will be replaced with DDR testing code)

       

      For the FSBL project, I have modified the compiler Symbols according to https://www.xilinx.com/support/answers/69754.html

      aj_compilerSymbols.png

      And I have modified the bsp's page table (MMUTableL2) accordingly (fsbl_bsp>psu_cortexa53_0>libsrc>standalone_v6.8>src>translation_table.S)

       

      .rept     0x0200               /* 0x8000_0000 - 0xBFFF_FFFF */ 
      .8byte     SECT + Memory          /* 1GB lower PL  changed Device -> Memory */
      .set     SECT, SECT+0x200000 .endr

       

      I have also enabled more verbose debug messages from the FSBL in xfsbl_config.h

       

      #define FSBL_DEBUG_DETAILED_VAL (1U)

       

      I have modified the linker script for the Hello world program to generate two .elf files, one targeted to the DDR, one targeted to BRAM. I then use the bootgen GUI in the SDK (select Hello World project > Xilinx > Create Boot Image) to generate two .bin files that can be loaded on the SD card, for comparison.

       

      When loading to the DDR, it works every time, when loading to the BRAM it has a weird behavior:

      • If the board has been off for more than ~2 minutes, it fails. A subsequent boot quickly after will fail too.
      • If the board has been on for more than ~30 seconds, it succeeds. Subsequent boots succeed too, until the board is left off for a while.

       

      Deeper investigation shows me that the FSBL hangs at the very first access attempt to the BRAM done by f_read() function called by Xfsbl_SdCopy() in xfsbl_sd.c

      I have tried many changes to the FSBL like adding a delay after the Bitstream is loaded (even a 40s delay!), adding extra check that the PL is ready, etc... but nothing I did changed the behavior described above.

      I have attached a copy of the FSBL debug messages as well.

      If you have any idea of what's going on, please tell me, I have exhausted ideas on my end.

       

      [This question was initially posted on Xilinx's forums, but marked as SPAM]

        • Re: ZU+ FSBL fails to access BRAM on "cold" start
          drozwood90

          Hi there,

           

          It is generally not a good idea to use BRAM in large amounts all tied to ONE point.  It can cause a lot of routing congestion.  While I do not think the amounts you are talking will be a problem, I just wanted to point that out to anyone reading this in the future.  It is actually one (among many) of the reasons the higher end UltraScale+ devices have UltraRAM.

           

          Regarding the issue you are seeing.  That is a curious one.  Nothing is jumping out to me directly.

          • Do you have the fan and heatsink installed?
          • Does applying pressure to the MPSoC part after it is cold booted help the issue?
          • If you run another program from OCM or something other than the DDR4/BRAM that you tested also show similar behavior, or only the BRAM booted device?
          • If you cool the board with a "compressed air" duster can, does that make the issue return (after it has been running for a while and booting just fine)?
          • Long shot, but I've seen stranger things, have you tried to reseat the SOM to the Carrier card?  -- as odd as it is, I've seen this resolve far stranger issues

           

          --Dan

            • Re: ZU+ FSBL fails to access BRAM on "cold" start
              mboechat

              Hi Dan,

              Thanks for the quick and helpful answer.

              I wish I had the luxury of the UltraRAM, unfortunately, my end target is an even lower end (also lower power) 2EG chip. If I work really hard at it, I think I could fit my program in 128K, but that would require a fair bit of work, so I'm postponing this option.

               

              • Yes, the stock fan and heat sink are installed, no modification have been made to them. I noticed that the fan is mounted in a pull configuration, which is unusual to me, but I don't see an issue with that here.
              • Applying a force (>30N) down on the heat sink/fan does not seem to help
              • Reseating the SOM in the IOCC did not change the issue, tried it twice, used air duster.
              • Did one better and used an actual freeze spray, a short (<2s) application on the heat sink is enough to make the issue return on a warm system. Air duster can works too, just requires a longer application (~10s)

               

              I could run the system in a temperature controlled environment to see what temperature is the tipping point. The first step of my program (when not running hello world) is to report the temperature measured by the sysmon, I see 35C or more, so it stands to reason that the tipping happens between 25C and 35C...

               

              drozwood90  wrote:

              • If you run another program from OCM or something other than the DDR4/BRAM that you tested also show similar behavior, or only the BRAM booted device?

              I do not fully understand your question. When booting for the SD card, the OCM is not available for anything else than the FSBL, right? If I run a Hello World from the BRAM booting from the SD card, I have exactly the same problem than when I boot the DDR test program. Given that I traced the issue to the first BRAM access by the FSBL, I don't think it matters what executable I am trying to load.

              I have tried the run the program using JTAG from the SDK, the issue is exactly the same when loading the bit stream then running from BRAM (cold - fails, warm -succeed) Here's the error message, seems logical :

              *BUT* when I skip the bistream upload (i.e. PL pre-programmed), It always run, even very cold. So something might be going wrong with the bitstream upload or initialization. I will admit I don't have much knowledge how the bus works and is arbitrated, but could it be the issue?

               

              And more importantly, how do I walk around the issue?

                • Re: ZU+ FSBL fails to access BRAM on "cold" start
                  drozwood90

                  Hi there!

                   

                  No problem on the quick response and thank you for being so detailed.  It really does help us on the forums!

                   

                  It seems to me that there might be a loose ball on the init path.  That was my initial guess, but wanted to at least have a good sense.  That is also why I suggested pressing on it.  If there is a loose ball, they can be deformed and pull up after shipping.  We do actually test this path at the factory with a rather intense factory test, and it seems that there might be something going on with it physically.

                   

                  Do you know who your local FAE is?  It might be good at this point to contact them so we can start to look at options to move your forward.  I do not want you to be spending a lot more time on this as it just seems really off to me that it seems that you just need to have a delay prior to accessing the BRAM based design...although you already tried that - besides, the chip should KNOW when the PL is loaded up.

                   

                  --Dan

                    • Re: ZU+ FSBL fails to access BRAM on "cold" start
                      mboechat

                      Thanks Dan,

                      I hope you are right and it is an isolated physical anomaly. For the time being, I simply solved the issue by having the FSBL check the chip temperature through the system monitor, and stall the bitstream load until the chip reaches 35C (any lower seems to be unreliable).

                      I know my local FAE, he actually suggested I post my question here. I will follow up with him. While the issue is fascinating and I'd love to go to the bottom of it, I have to agree with you that I don't have the time necessary at the moment.

                      I will update this thread if anything noteworthy comes up.

                        • Re: ZU+ FSBL fails to access BRAM on "cold" start
                          mboechat

                          Quick follow-up a month later.

                          TLDR; no definitive answer to the initial question. But most probably "try another board"

                           

                          The FAE was unable to provide me a second board for comparison. To be fair, they got me a similar board, but with an "engineering silicon", which means lots of recompile, and at this point I received our custom board and had to focus on that. So mystery is NOT solved.

                           

                          Now, the custom board is having a different zynq on it (ZU2CG), a different RAM, etc... but I have loaded it with an equivalent vivado project (i.e. using 256k of BRAM there too) and it runs smoothly. With the chip chilled to -16C it works fine. With the chip at 60C, it works fine. I am unable to create conditions to reproduce the issue on the custom board, and I have tried!

                          So I'm sorry if anybody else has the same issue, it seems that the answer is "try another board".