LE10 rpi4-4gb oom errors

  • Setup

    • RPi4 4GB
    • 12TB USB 3.0 Drive
    • LE 10 beta
    • Docker

    I am encountering constant oom events when there is plenty of available ram. The oom-killer can fire as often as every 30 seconds, and it loves to kill kodi. I know I can tell it to avoid killing kodi, but the underlying problem would persist. I've encountered this before on 9.2.6, but not nearly as often as on the new build.


    I do a fresh install of LE10, attach my external USB3 drive, add my media libraries and add any docker image that generates lots of disk io (transmission, rsync, medusa, etc), and after a few minutes the oom-killer will begin decimating my processes every few minutes. My rpi4 memory usage never really goes over 1 GB or so (out of 4GB). I have seen the oom events occur even before docker and Kodi loads. I don't believe that memory fragmentation is a problem, as I always have plenty of 4096kB blocks of DMA and HiMem. I have been beating my head against this, and I believe it is possibly related to USB3 disk writes trigger OOM reaper on Pi 4. Any suggestions on how best to try to debug this?


    I really love the design philosophy of LE and I would rather not shift to a normal distro, but other users report a 64 bit kernel alleviates may oom errors. Would it even be possible for me to compile a 64 bit version of LE for the rpi4 knowing that this would force software x264 decoding and loss of widevine?


    Things I have tried:

    • Enabling cgroups and limiting the docker images to only 256MB (limiting works, but oom persists)
    • "sysctl vm.overcommit_memory=2" This causes the system to crash during boot
    • cron job of "echo 4 > /proc/sys/vm/drop_caches" every minute. This reduces the frequency of oom events, but it doesn't stop them.
    • mucking about with "vm.extfrag_threshold, vm.vfs_cache_pressure, swappiness": no real change
    • Adding a massive swap file: It never gets used.
    • Fresh LE10 install on both a large SD card and a usb 3.0 attached ssd.
    • Even without docker, I believe I have seen one or two oom events, but they are rare.

    Thanks!


  • I've never personally seen this before and would likely consider it a kernel bug.

  • Edit: Disregard this, this only delayed the problems for an extra hour or so before the oom-killer kicked in.



    I have tried tons of things to fix this, and I believe I have finally cracked it. I am fairly certain that adding these two lines to my autostart.sh has stopped the condition that summons the oom-killer. Naturally, there is still some underlying problem that is merely sidestepped here.


    Code: /storage/.config/autostart.sh
    sysctl vm.dirty_background_ratio=5
    sysctl vm.dirty_ratio=8


    Details:

    I found that before the oom killer was invoked, the kernel process kswapd0 was using %100 cpu. I found that if the caches were flushed via echo 4 > /proc/sys/vm/drop_caches, it resolved the problem temporarily. This led me to mess with the cache ratios to force the system to keep the caches as empty as possible.

  • I have the exact same config, and the exact same issue.


    I've noticed it happens when there is a lot of I/O activity to my external HDD, either when downloading big files or when playing back those files through Kodi. It does seem related to the issue you linked on the raspberry github.


    I'm still investigating this too, but you're not alone !


    EDIT:


    Tried to put arm_64bit=1 in config.txt to enable 64bit kernel mode, but it didn't work (RPi stuck after boot, LE doesn't seem to support that).

  • For what it's worth, I mitigated the issue by moving the external HDD to a USB 2.0 port and limiting my download speed to 10 MB/s. It seems the best way to prevent the issue it to not do a lot of I/O, so not ideal at all but at least I can use the system again. I'm still having trouble unrar-ing a large 30 GB files though.

  • For what it's worth, I mitigated the issue by moving the external HDD to a USB 2.0 port and limiting my download speed to 10 MB/s. It seems the best way to prevent the issue it to not do a lot of I/O, so not ideal at all but at least I can use the system again. I'm still having trouble unrar-ing a large 30 GB files though.

    I am not even sure it is limited to USB usage. I setup a second rpi4 as an nfs, and ran a docker torrent client on Libreelec accessing its share over samba, and I still got the error. This is backed up by this comment on github where a user is able to reliably trigger the error with a script that employs a using a virtual filesystem. Oh, I see that you are already participating in that issue thread.


    My rpi4 running raspbian with the 64-bit kernel handles these docker containers without breaking a sweat. I would very much like to address this somehow, but at this point, I am not sure where to start. It seems that this is unlikely to be fixed in the 32-bit kernel upstream when arm_64bit=1 works in most use cases.