Posts by frakkin64

    Memory leak associated to Youtube when MPEG-DASH is enabled · Issue #520 · anxdpanic/plugin.video.youtube (github.com)


    I couldn't reproduce it with a simple requests.get loop in Python, so it's perhaps related to the embedded interpreter or Kodi built-ins bound to the embedded interpreter.

    This was the kludge test script I tried via command line:

    So there is a 60-second loop in YouTube add-on for checking whether httpd is running, which it is enabled if you use MPEG-DASH or API configuration page. I typically use MPEG-DASH, turned that off in settings and the leak is gone. Turned it back on (w/ enable/disable toggle of Youtube addon) and the leak returns.

    I suspect it is perhaps related to the act of "pinging the httpd server" that is causing the problem, the loop does two things:

    1. uses requests to do an http get to the internal http service running for youtube addon

    2. restarts httpd service when request fails (not 204)

    I really doubt it's #2 because I don't think Youtube would work at all, IIRC ISA uses httpd to communicate back to the requesting player for some stuff (I remember that from the Netflix addon), it is probably requests library is doing some sort of resource leakage?

    At idle it is a ~500 byte memory leak every minute (it's pretty regular), 30MB leaked per hour. It is either add-on, "feature" related, or based on whether you actually start doing something. The aw device is just sitting idle at the home screen, nothing done there, and it's not showing any leakage. The RPi4 device has been used to playback PVR, videos, youtube, and it is routinely leaking ~500 bytes every minute.

    memory monitoring

    Here is the script mm.sh:

    http://ix.io/4GIb

    You set permissions to 0700 and run it with ./mm.sh |tee -a mm.csv

    Ethernet issue is related to commit f8e03eae5, reverted and it is working. rmmod dwmac_sun8i; modprobe dwmac_sun8i will recover ethernet as a workaround.

    http://ix.io/4GGs - for full dmesg (which includes an unload and reload of the module noted above).

    Code
    [   15.837181] NETDEV WATCHDOG: eth0 (dwmac-sun8i): transmit queue 0 timed out 5584 ms

    After this splat, the network is super slow, lots of soft hang ups. As for the prove locking, it proved nothing (locked up with nothing useful). So I am running now with stock + performance governor, and if that yields any success then I will go into tweaking the transition latency & dropping some of the opps to match BSP.

    frakkin64 if you don't mind messing with DT, there are some differences between mainline and bsp worth testing. BSP DT has clock latency set to 2000000 and it's missing opp points for 1608 MHz and 1704 MHz. Would you mind testing this? It's all in sun50i-h6-cpu-opp.dtsi.

    Yes, can do. I just pulled down the BSP to take a look at the DT. It's quite a swing from ~244ms to 2s for transition latency.

    Edit: I guess it's more like 244us to 2ms. It seems to me this clock-latency-ns DT node is largely informational? I'll try dropping out the two frequency points, but I get the impression with BSPs they get to a point where it works and they are like good enough.

    So far it was just CONFIG_PROVE_LOCKING=y, and the Ethernet driver really hates it :). The network connections are super-super slow. I think I read it is a bit heavy, oddly the shell via serial is very responsive. I think there may also be a more generic lock debugging where it just logs locks acquired/released, but prove locking is supposed to check for deadlocks and report them out before deadlocking.

    I will probably just let it sit there and hope there is a splat or something on the serial console. If that doesn't produce anything meaningful then I will try running with performance governor on a nightly build and see what happens.

    I ended up going into a source tree for linux, and copying over the config, running menu config, then copying back over the minimal adjustment.

    Random thought .. I wonder if something like this is needed?

    Possibly, but it seems like there is a definitely some sort of dependency/contention between schedutil governor and kodi. I decided to go with ondemand and it did freeze, so now I am back to testing performance (which I think is full speed?). So it's either a combination of scheduler + iommu disabling, or it didn't run long enough, or it is frequency switching.

    I am trying to enable lockdep & prove_locking now, which I thought all the dependencies were there but it just stripped those config options out of the kernel build. Can you trigger the config target via scripts/build, or do you usually do that outside of the LE build system?

    I'm trying the performance governor, when I tried to change it while it was in this state the shell would hang/deadlock. I rebooted, did:

    echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor


    We will see how that goes, the sugov kthread is related to schedutil governor.

    jernej I ran for 31.5 hours using the performance governor instead of schedutil, and no problems, no lock ups. I just rebuilt on master with just one patch really for H6, which is changing the governor to performance instead of schedutil. I'll let you guys know how that goes.

    There is a possibly related discussion on LKML, but it's really about CPU hotplug (I guess when the turn off performance cores vs low-power cores in some of the Android phones) but it suggests there is a potential lock contention between schedutil & cgroups.

    [PATCH] sched/schedutil: Fix deadlock between cpuset and cpu hotplug when using schedutil (kernel.org)

    It's just speculation at this point, and it seems like that discussion sort of died out.

    Might be worth testing with a clean kodi installation without addons.

    Pretty basic addons here, on the RPi4 it's just the Libretro stuff, pvr.hts, youtube and plutotv. On the OPi3 it's even less just pvr.hts and youtube, plus the usual dependencies.

    I suppose it could be pvr.hts or Youtube/InputStream Adaptive. Nothing sketchy, can't be bothered with that crap it breaks all the time and ISPs are going to be banning those illegal streams in the US anyways.

    I have zachmorris repo on the RPi4 for IAGL, but really haven't used any of the RetroPlayer stuff, the nostalgia wore off pretty quickly :).

    Noticing with the latest LE12 nightly that there appears to be a memory leak, seeing oom killer on RPi4 quite a bit more frequently.

    http://ix.io/4Gxm - journalctl oom-killer reports

    http://ix.io/4Gxn - pmap of kodi.bin (3.8G virt, 1.1G rss at the moment -- will probably be killed soon)

    Seems like a similar issue on Allwinner H6 boards. Anyone else seeing this problem?

    RPi4 - Built on commit 09641de8d37c324555542c1ba62992ce0d89d5e3 (looks like RPi4 is using Kodi commit aaef1a0d1e37ff87361011fd48c89d9e69ec9a9d, just before alpha 3)

    OPi3lts - Built on commit 699f92cf62ce318a80d28d6e06ab55661e6c6c03 (this is using 21 alpha 3)

    Alpha 2 was OK, didn't notice any problems with that.

    Anyway, I went through CedarX vendor lib and found some differences regarding MPEG2 setup. You can try this kernel patch:

    http://ix.io/4Fsv However, I couldn't find any real world difference. Maybe something can be done on ffmpeg side.

    Yeah, that patch didn't make much of a difference with MPEG2. I'm OK with software decoding, most of the MPEG2 content is just OTA broadcasts and it's usually not much more than 1080 which neither the OPi3 or RPi4 have a problem to software decode.

    Note that this is just test, IOMMU benefits are big. It protects unallocated memory being used by HW (in other words, it prevents HW corrupting memory) and it allows any amount of free memory to be allocated to video decoding (otherwise only CMA memory can be used, which is limited in size).

    So far no lock ups (14 hours uptime), Kodi has locked up but not the shell (so this is different), but this is in dmesg, seems to be a conflict between sugov (some sort of kernel thread or work queue?) and kodi.bin:

    http://ix.io/4Gr3



    I'm trying the performance governor, when I tried to change it while it was in this state the shell would hang/deadlock. I rebooted, did:

    echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor

    We will see how that goes, the sugov kthread is related to schedutil governor.

    I have one possible explanation - IOMMU. H6 driver doesn't have locking implemented in some functions, but other do. Without that, race conditions are possible when allocating and freeing memory during video decoding and rendering.

    I had a similar experience as gadget_guy, using LE 12 build. I'll give it a shot disabling IOMMU as well.

    IDK file system causes crash. Maybe low memory situation. Can you also please enable KASAN? It should print error report in dmesg whenever memory issue is detected.

    It's a bit all over the place, SError, then unimplemented instruction, random freezing. Feels like a memory problem. I'll compile with KASAN as well and see what happens.