Posts by frakkin64

    At idle it is a ~500 byte memory leak every minute (it's pretty regular), 30MB leaked per hour. It is either add-on, "feature" related, or based on whether you actually start doing something. The aw device is just sitting idle at the home screen, nothing done there, and it's not showing any leakage. The RPi4 device has been used to playback PVR, videos, youtube, and it is routinely leaking ~500 bytes every minute.

    memory monitoring

    Here is the script mm.sh:

    http://ix.io/4GIb

    You set permissions to 0700 and run it with ./mm.sh |tee -a mm.csv

    Ethernet issue is related to commit f8e03eae5, reverted and it is working. rmmod dwmac_sun8i; modprobe dwmac_sun8i will recover ethernet as a workaround.

    http://ix.io/4GGs - for full dmesg (which includes an unload and reload of the module noted above).

    Code
    [   15.837181] NETDEV WATCHDOG: eth0 (dwmac-sun8i): transmit queue 0 timed out 5584 ms

    After this splat, the network is super slow, lots of soft hang ups. As for the prove locking, it proved nothing (locked up with nothing useful). So I am running now with stock + performance governor, and if that yields any success then I will go into tweaking the transition latency & dropping some of the opps to match BSP.

    frakkin64 if you don't mind messing with DT, there are some differences between mainline and bsp worth testing. BSP DT has clock latency set to 2000000 and it's missing opp points for 1608 MHz and 1704 MHz. Would you mind testing this? It's all in sun50i-h6-cpu-opp.dtsi.

    Yes, can do. I just pulled down the BSP to take a look at the DT. It's quite a swing from ~244ms to 2s for transition latency.

    Edit: I guess it's more like 244us to 2ms. It seems to me this clock-latency-ns DT node is largely informational? I'll try dropping out the two frequency points, but I get the impression with BSPs they get to a point where it works and they are like good enough.

    So far it was just CONFIG_PROVE_LOCKING=y, and the Ethernet driver really hates it :). The network connections are super-super slow. I think I read it is a bit heavy, oddly the shell via serial is very responsive. I think there may also be a more generic lock debugging where it just logs locks acquired/released, but prove locking is supposed to check for deadlocks and report them out before deadlocking.

    I will probably just let it sit there and hope there is a splat or something on the serial console. If that doesn't produce anything meaningful then I will try running with performance governor on a nightly build and see what happens.

    I ended up going into a source tree for linux, and copying over the config, running menu config, then copying back over the minimal adjustment.

    Random thought .. I wonder if something like this is needed?

    Possibly, but it seems like there is a definitely some sort of dependency/contention between schedutil governor and kodi. I decided to go with ondemand and it did freeze, so now I am back to testing performance (which I think is full speed?). So it's either a combination of scheduler + iommu disabling, or it didn't run long enough, or it is frequency switching.

    I am trying to enable lockdep & prove_locking now, which I thought all the dependencies were there but it just stripped those config options out of the kernel build. Can you trigger the config target via scripts/build, or do you usually do that outside of the LE build system?

    I'm trying the performance governor, when I tried to change it while it was in this state the shell would hang/deadlock. I rebooted, did:

    echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor


    We will see how that goes, the sugov kthread is related to schedutil governor.

    jernej I ran for 31.5 hours using the performance governor instead of schedutil, and no problems, no lock ups. I just rebuilt on master with just one patch really for H6, which is changing the governor to performance instead of schedutil. I'll let you guys know how that goes.

    There is a possibly related discussion on LKML, but it's really about CPU hotplug (I guess when the turn off performance cores vs low-power cores in some of the Android phones) but it suggests there is a potential lock contention between schedutil & cgroups.

    [PATCH] sched/schedutil: Fix deadlock between cpuset and cpu hotplug when using schedutil (kernel.org)

    It's just speculation at this point, and it seems like that discussion sort of died out.

    Might be worth testing with a clean kodi installation without addons.

    Pretty basic addons here, on the RPi4 it's just the Libretro stuff, pvr.hts, youtube and plutotv. On the OPi3 it's even less just pvr.hts and youtube, plus the usual dependencies.

    I suppose it could be pvr.hts or Youtube/InputStream Adaptive. Nothing sketchy, can't be bothered with that crap it breaks all the time and ISPs are going to be banning those illegal streams in the US anyways.

    I have zachmorris repo on the RPi4 for IAGL, but really haven't used any of the RetroPlayer stuff, the nostalgia wore off pretty quickly :).

    Noticing with the latest LE12 nightly that there appears to be a memory leak, seeing oom killer on RPi4 quite a bit more frequently.

    http://ix.io/4Gxm - journalctl oom-killer reports

    http://ix.io/4Gxn - pmap of kodi.bin (3.8G virt, 1.1G rss at the moment -- will probably be killed soon)

    Seems like a similar issue on Allwinner H6 boards. Anyone else seeing this problem?

    RPi4 - Built on commit 09641de8d37c324555542c1ba62992ce0d89d5e3 (looks like RPi4 is using Kodi commit aaef1a0d1e37ff87361011fd48c89d9e69ec9a9d, just before alpha 3)

    OPi3lts - Built on commit 699f92cf62ce318a80d28d6e06ab55661e6c6c03 (this is using 21 alpha 3)

    Alpha 2 was OK, didn't notice any problems with that.

    Anyway, I went through CedarX vendor lib and found some differences regarding MPEG2 setup. You can try this kernel patch:

    http://ix.io/4Fsv However, I couldn't find any real world difference. Maybe something can be done on ffmpeg side.

    Yeah, that patch didn't make much of a difference with MPEG2. I'm OK with software decoding, most of the MPEG2 content is just OTA broadcasts and it's usually not much more than 1080 which neither the OPi3 or RPi4 have a problem to software decode.

    Note that this is just test, IOMMU benefits are big. It protects unallocated memory being used by HW (in other words, it prevents HW corrupting memory) and it allows any amount of free memory to be allocated to video decoding (otherwise only CMA memory can be used, which is limited in size).

    So far no lock ups (14 hours uptime), Kodi has locked up but not the shell (so this is different), but this is in dmesg, seems to be a conflict between sugov (some sort of kernel thread or work queue?) and kodi.bin:

    http://ix.io/4Gr3



    I'm trying the performance governor, when I tried to change it while it was in this state the shell would hang/deadlock. I rebooted, did:

    echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor

    We will see how that goes, the sugov kthread is related to schedutil governor.

    I have one possible explanation - IOMMU. H6 driver doesn't have locking implemented in some functions, but other do. Without that, race conditions are possible when allocating and freeing memory during video decoding and rendering.

    I had a similar experience as gadget_guy, using LE 12 build. I'll give it a shot disabling IOMMU as well.

    IDK file system causes crash. Maybe low memory situation. Can you also please enable KASAN? It should print error report in dmesg whenever memory issue is detected.

    It's a bit all over the place, SError, then unimplemented instruction, random freezing. Feels like a memory problem. I'll compile with KASAN as well and see what happens.

    My last test was to try and repeat what OP reported, turning off CEC from the Kodi side. And the device still froze, so my feeling is CEC is not the cause.


    Interesting, this was on the serial console:

    Can you give me your exact messages? I'm interested in numbers in them.

    Generally this is what I see.

    Sep 05 13:51:00 LibreELEC kernel: dw_hdmi_cec_hardirq: stat=11 LOW_DRIVE

    Sep 05 13:51:01 LibreELEC kernel: dw_hdmi_cec_hardirq: stat=11 LOW_DRIVE

    Sep 05 13:51:01 LibreELEC kernel: dw_hdmi_cec_hardirq: stat=11 LOW_DRIVE

    Sep 05 13:51:02 LibreELEC kernel: dw_hdmi_cec_hardirq: stat=11 LOW_DRIVESep 05 13:51:02 LibreELEC kernel: dw_hdmi_cec_hardirq: stat=11 LOW_DRIVE

    Sep 05 13:51:02 LibreELEC kernel: dw_hdmi_cec_hardirq: stat=11 LOW_DRIVE

    Sep 05 13:51:03 LibreELEC kernel: dw_hdmi_cec_hardirq: stat=11 LOW_DRIVE

    Sep 05 13:51:03 LibreELEC kernel: dw_hdmi_cec_hardirq: stat=11 LOW_DRIVE

    Sep 05 13:51:17 LibreELEC kernel: dw_hdmi_cec_hardirq: stat=11 LOW_DRIVE

    Sep 05 13:51:18 LibreELEC kernel: dw_hdmi_cec_hardirq: stat=11 LOW_DRIVE

    Sep 05 13:51:18 LibreELEC kernel: dw_hdmi_cec_hardirq: stat=11 LOW_DRIVE

    Sep 05 13:51:18 LibreELEC kernel: dw_hdmi_cec_hardirq: stat=11 LOW_DRIVE

    frakkin64 can you confirm, that with new CEC patches, you don't see messages such as:

    cec-dw_hdmi: message 10 timed out

    dw_hdmi_cec_hardirq: stat=11 LOW_DRIVE


    It doesn't matter if CEC on TV is enabled or not.

    If CEC is on from the device side, yes. If CEC is off from the device side, no. But it's not frequent, usually a few times at start up, and then throughout usage.