RPi 4 2 GB losing USB - Firmware issue?

  • Hello,

    Using nightly-20220131-71f8aac (same issue with nightly-20220130, though) on a RPi 4 2GB.

    Fresh install, sees USB keyboard and USB3 HDD fine. USB firmware version displayed.

    One or 2 reboots later, disk and keyboard are gone, USB firmware version is 0000000, saying using bootloader EEPROM.

    https://pastebin.ubuntu.com/p/p67hyDpZFN/ (no kodi debug log, though, as I'm thinking that Kodi itself has nothing to do with it, the issue is lower in the stack)


    Afresh install fixes, then back to no USB.

    Code
    BOOTLOADER: up to date
       CURRENT: Tue Jan 25 14:30:41 UTC 2022 (1643121041)
        LATEST: Thu Apr 29 16:11:25 UTC 2021 (1619712685)
       RELEASE: default (/usr/lib/kernel-overlays/base/lib/firmware/raspberrypi/bootloader/default)
                Use raspi-config to change the release.
    
      VL805_FW: Using bootloader EEPROM
         VL805: up to date
       CURRENT: 00000000
        LATEST: 00000000

    I have other RPis 4 with different hardware revisions (c03111, c03112) not having this problem.


    This one is apparently a brand new revision number: b03115

    Could this be an issue wrt firmware?

    Edited once, last by camelreef: Merged a post created by camelreef into this post. (February 1, 2022 at 1:32 PM).

  • Somehow you seem to be running an ancient firmware version (start.elf)

    Code
    Jan 18 11:35:43.585387 LibreELEC kernel: raspberrypi-firmware soc:firmware: Attached to firmware from 2021-09-30T19:21:54, variant start_x

    Try a fresh install, this should be 2022-01-06

    so long,

    Hias

  • Somehow you seem to be running an ancient firmware version (start.elf)

    Code
    Jan 18 11:35:43.585387 LibreELEC kernel: raspberrypi-firmware soc:firmware: Attached to firmware from 2021-09-30T19:21:54, variant start_x

    Try a fresh install, this should be 2022-01-06

    so long,

    Hias

    ah crap, my bad, wrong paste... I had played a bit... Sorry...

    A fresh install is what got me in a pickle.

    I have a fresh install, never rebooted, so USB is there

    Code
    [    0.053470] raspberrypi-firmware soc:firmware: Attached to firmware from 2022-01-06T15:40:20, variant start_x
    [    0.056806] raspberrypi-firmware soc:firmware: Firmware hash is 58e03c94953762222f2b838390dde54d46c38381

    yup... I've rebooted, and the whole USB is gone, empty lsusb and all.

    http://ix.io/3Ol5

    With this interesting part, probably:

    Nico

  • It looks like the firmware crashed. Can you try with a complete clean install on a separate sd card? Not sure if force_turbo and/or core_freq_min might play a role in that.

    Edit: if this doesn't help please also test with latest RPiOS and see if you get the same "Firmware transaction timeout" splat.

    so long,

    Hias

  • OK, I've removed force_turbo and/or core_freq_min and USB is back. If this is the fix atm, I'm fine with it. I'm pretty confident that there won't be any 50/60 Hz content played on this, ever.

    I've rebooted the system a few times, it looks solid.

    http://ix.io/3Opp

    Code
    BOOTLOADER: up to date
       CURRENT: Tue Jan 25 14:30:41 UTC 2022 (1643121041)
        LATEST: Thu Apr 29 16:11:25 UTC 2021 (1619712685)
       RELEASE: default (/usr/lib/kernel-overlays/base/lib/firmware/raspberrypi/bootloader/default)
                Use raspi-config to change the release.
    
      VL805_FW: Using bootloader EEPROM
         VL805: up to date
       CURRENT: 000138a1
        LATEST: 000138a1

    I'll put the system through the whole workload (there is another machine using the mounted disk over its samba share) and see how it is.

    It appears that my remote samba client is still having issues, with the drive mounted from LE dropping out. I've even had LE totally freeze (no GUI, logged-in SSH session, samba transfers, etc...) on me for at least a full minute, before all came back.


    I see this on the LE machine:

    http://ix.io/3Oqv


    I am now doing a test from my client mounting a volume from my NAS. I doesn't hurt checking that the client is not the problem... (though I have other setups in production configured in exactly the same way that are working perfectly fine)

    Nico

    Edited 2 times, last by camelreef (February 2, 2022 at 1:35 PM).

  • Using the same client, doing exactly the same thing, under the same load, but using my NAS as a samba server and not the LE box, I have no issue, things are stable.

    This points more and more to that Pi4 with a brand new hardware revision having a problem.

    My next step is to take my SD card and stick it into another Pi 4 I have handy, an older hardware revision, and see if it is stable.

  • erm... Nope... Same issue with an older Pi 4 hardware revision...

    Has anything changed re. Samba recently or should I focus on the client?

    I have an exact replica of this setup running LE nightly-20220120-f7f2fd5 without any issue.

    I went back to nightly-20220125-ce61477 on the non prod setup, but same issue.

    I'm guessing that a test using the release LE10 would tell me.

    Edited 2 times, last by camelreef (February 2, 2022 at 6:09 PM).

  • you know what?

    Let's stick to the config.txt options crashing the firmware.

    The samba stuff is probably from the DietPi on the other machine... I'm reinstalling it fresh and will see...

  • Not really sure. There's been a recent firmware change affecting clocks https://github.com/raspberrypi/fi…70f31a98167eef6 - I plan to update kernel+firmware in LE later this week, so might be worth giving it a try after that.

    It could also be that the silicon in the RPi4 is marginal - a small overvoltage in config.txt might help.

    Also please test with RPiOS and latest rpi-update kernel.

    so long,

    Hias

    will do tomorrow. I need to unglue myself from that system and rejoin family life this evening!

    Thanks for your usual help and patience.

    Nico

  • It could also be that the silicon in the RPi4 is marginal - a small overvoltage in config.txt might help.

    over_voltage=1 breaks tons!

    http://ix.io/3OvR

    Kodi does start, the disk is not mounted, dmesg is full of errors...

    That hw revision appears to be a tad problematic...

    Code
    Hardware	: BCM2835
    Revision	: b03115
    Serial		: 10000000b7f3ee23
    Model		: Raspberry Pi 4 Model B Rev 1.5

    Nico

    Edited once, last by camelreef (February 3, 2022 at 11:09 AM).

  • Have you tested it with Raspberry Pi OS, minimal USB configuration (no external hub), then start adding devices in, and perhaps report the issue to https://github.com/raspberrypi/linux/ (they prefer testing on Raspberry Pi OS, but have been known to look into other OS issues, really depends). Once you have Raspberry Pi OS up then you will probably want to use rpi-update to test the next branch (5.15). The external hub & hard drive is questionable, whether if it is over-drawing current from the USB ports (design maximum is 1.2 amps from all USB ports from what I read). It could be your power adapter as well.

    I would be surprised if it is a hardware revision issue, at this point the hardware revisions should be fairly minor and well tested, opening that issue and putting your speculation into the issue may reveal what the hardware change was and what the engineers think about that theory. It is likely an issue with the 5.15 kernel. If you were getting flip timeouts/drm related issues earlier when the firmware timed out then it wouldn't surprise me, because those seem to cause kernel soft-lockups/deadlocks. It seems like your earlier splat is from VC4 DRM.

    Perhaps the kernel push that HiassofT referred to will help, he pushed the PR yesterday so it may show up in the nightly soon.

  • Have you tested it with Raspberry Pi OS, minimal USB configuration (no external hub), then start adding devices in, and perhaps report the issue to https://github.com/raspberrypi/linux/ (they prefer testing on Raspberry Pi OS, but have been known to look into other OS issues, really depends). Once you have Raspberry Pi OS up then you will probably want to use rpi-update to test the next branch (5.15). The external hub & hard drive is questionable, whether if it is over-drawing current from the USB ports (design maximum is 1.2 amps from all USB ports from what I read). It could be your power adapter as well.

    I would be surprised if it is a hardware revision issue, at this point the hardware revisions should be fairly minor and well tested, opening that issue and putting your speculation into the issue may reveal what the hardware change was and what the engineers think about that theory. It is likely an issue with the 5.15 kernel. If you were getting flip timeouts/drm related issues earlier when the firmware timed out then it wouldn't surprise me, because those seem to cause kernel soft-lockups/deadlocks. It seems like your earlier splat is from VC4 DRM.

    Perhaps the kernel push that HiassofT referred to will help, he pushed the PR yesterday so it may show up in the nightly soon.

    No hub was ever involved, the HDD is plugged straight into the Pi. I've had the same HDD into a Pi 4 in other setups like that without issue.

    The power adapter is a brand new official one. I could try another one I have in stock, eventually. I never had a lightning bolt on the screen.

    I'm just done setting up a fresh Raspberry Pi OS with a full upgrade, with Samba sharing my drive.

    I haven't run rpi-update yet.

    I'm now running the same workflow and monitoring the logs. so far so good, but I will give it time.

    Then I'll update and do the same.

    Nico

  • Things were running fine with a fresh and normally updated RPi OS, as in the previous message.

    So I ran rpi-update, which took me to another 5.10 kernel, not a 5.15, as expected. Firmware also appears to be the same.

    Should I do this instead?

    Code
    sudo BRANCH=next rpi-update
    
    or
    
    sudo rpi-update next

    Edited once, last by camelreef (February 3, 2022 at 1:09 PM).

  • My workflow/workload was running fine on the system gone through the vanilla rpi-update 5.10 kernel.

    So I went to the next branch, 5.15 kernel, same firmware.

    This is also very solid under the same workflow. The remote RPi3 has no issue with the CIFS/Samba mount, it never loses it. all is good in the world.

    I would say that this rules out the hardware's marginality, the power adapter, the disk, the network and the client's setup. None of those have changed.

    I have the same setup, RPi4, hard drive, network, etc. running elsewhere without issues at all. The only differences are that it's running nightly-20220120-f7f2fd (which I daren't upgrade atm) and it a 1.2 hardware revision.

    So... what can I do to help hunt for the issue on LE? Use and abuse me!

    I'll start by trying today's nightly (20220203), then I can put the same SD card into a rev. 1.1 Pi 4 that I have access to. That may point at a software change since Jan. 20 if I have the same problem.

    Nico

  • Rev 1.5 RPi 4 - LE11 nightly-20220203-5347c5e

    And boom!, on the remote machine:

    Code
    Feb 03 14:16:06 grabber kernel: CIFS: VFS: \\kodiplayer.local sends on sock 000000001aa9464e stuck for 15 seconds
    Feb 03 14:16:06 grabber kernel: CIFS: VFS: \\kodiplayer.local Error -11 sending data on socket to server
    Feb 03 14:16:09 grabber kernel: CIFS: VFS: Send error in read = -11
    Feb 03 14:16:09 grabber kernel: CIFS: VFS: Send error in read = -11
    Feb 03 14:16:34 grabber kernel: CIFS: VFS: \\kodiplayer.local sends on sock 00000000807891b4 stuck for 15 seconds
    Feb 03 14:16:34 grabber kernel: CIFS: VFS: \\kodiplayer.local Error -11 sending data on socket to server
    Feb 03 14:16:50 grabber kernel: CIFS: VFS: \\kodiplayer.local sends on sock 00000000807891b4 stuck for 15 seconds
    Feb 03 14:16:50 grabber kernel: CIFS: VFS: \\kodiplayer.local Error -11 sending data on socket to server

    IIUC, the RPi OS that worked well was using the Tue 25 Jan 14:30:41 UTC 2022 (1643121041) firmware.

    LE11 still uses the Tue Jan 25 14:30:41 UTC 2022 (1643121041) firmware.

    Do I need a new nightly to change this, or can I force it on my own quickly?

    As ever, thanks for your help!

    Nico