We have been using The AVNet Smartedge IIoT Gateway for a project for a while now. We have purchased a couple of dozen of these devices and use them with ethernet, canbus, HDMI and other protocols.
We are running a custom Raspbian image which was modified to boot correctly, and with the appropriate kernel modules updated and "grabbed" from the official AVNet image. The main reason for using standard raspbian is that the AVNet image would fail to correctly set the screen resolution on our screens (1024x600), while raspbian did not fail.
We have detected a problem which we cannot understand and is causing us several high-impact issues. In most (but not all) of our setups, the USB ports on the device begin malfunctioning, and the only solution is to reboot the system. The issue happens seemingly at random (e.g. after a period of inactivity [only some canbus messages being sent]). USB devices start acting out, responding extremely slowly. Unplugging them and plugging them back in does nothing. Using https://github.com/mvp/uhubctl to power cycle the ports also seems to do nothing (once it seemed to fix the issue, but in general it doesn'). This can happen with just a single USB device plugged in (a touchscreen), or with multiple USB devices. Might be worth noting that all of the devices have an active modem installed, and some of these devices use different modems -- in truth, this might be the source of the problem but we haven't been able to check yet).
There have been several attempts at debugging this issue to no avail. The only "seemingly suspect" message in the logs is the following:
[147490.486924] smsc95xx 1-1.1:1.0 eth0: Failed to read reg index 0x00000114: -110 [147490.486938] smsc95xx 1-1.1:1.0 eth0: Error reading MII_ACCESS [147490.486947] smsc95xx 1-1.1:1.0 eth0: MII is busy in smsc95xx_mdio_read [147490.486957] smsc95xx 1-1.1:1.0 eth0: Failed to read MII_BMSR [147490.486991] smsc95xx 1-1.2.1:1.0 eth1: Failed to read reg index 0x00000114: -110 [147490.487000] smsc95xx 1-1.2.1:1.0 eth1: Error reading MII_ACCESS [147490.487009] smsc95xx 1-1.2.1:1.0 eth1: MII is busy in smsc95xx_mdio_read [147490.487017] smsc95xx 1-1.2.1:1.0 eth1: Failed to read MII_BMSR
These messages seem to randomly appear on all devices exhibiting this behviour. However, they sometimes also appear on "healthy" devices.
We've already tried setting dwc_otg.fiq_fix_enable=0 but it did not help.
Does anyone have an idea as to what might be happening? This issue is very aggravating, because the only fix is to reboot the machines, and we don't even have a way of determining when exactly this system is failing (the only way is physically testing a USB device like a keyboard)