DVB issue since LE switched to kernel 4.9.x

  • Alright :)

    I just wanted to clarify what I did (and did not) to get to 4.9 since there seemed to be some confusion how I (and someone else) got there.

    I'll get back here in a few hours with some results of the tests on one or both of my RPIs.

  • rpi-update bcc6146e102d85b1aa214855ad7aae278d3bd269

    gets me:

    Linux pi2 4.9.0-v7+ #939 SMP Thu Dec 15 18:07:48 GMT 2016 armv7l ARMv7 Processor rev 5 (v7l) BCM2835 GNU/Linux

    And the artifacts are back.

    rpi-update 7a47836821b92efa569500b3382b1812082e42d3

    gets me:

    Linux pi2 4.8.13-v7+ #937 SMP Fri Dec 9 17:45:13 GMT 2016 armv7l ARMv7 Processor rev 5 (v7l) BCM2835 GNU/Linux

    No artifacts. Sticking with this build for now.

    Let me know if/when there's a test build / git hash to localize things further.

  • I can confirm everything works at it should with the RPi with kernel 4.4 (original Raspbian)

    Next test: I have connected the DVBSky to the other RPi (with kernel reverted to 4.8) and have scheduled some recordings on it.

    Results will be in tomorrow around noon. It's past midnight here, and I have to call it a day (night) for now.

  • UPDATE 1: The reverted 4.8-kernel also works as expected: 2 recordings, 0 errors.


    UPDATE 2:

    Code
    pi@thuban:~ $ uname -a
    Linux thuban 4.8.13-v7+ #937 SMP Fri Dec 9 17:45:13 GMT 2016 armv7l GNU/Linux
    pi@thuban:~ $ sudo rpi-update bcc6146e102d85b1aa214855ad7aae278d3bd269
    pi@thuban:~ $ sudo reboot
    pi@thuban:~ $ uname -a
    Linux thuban 4.9.0-v7+ #939 SMP Thu Dec 15 18:07:48 GMT 2016 armv7l GNU/Linux

    And now the errors are back.

    I think my situation is clear: There's nothing wrong with my hardware or config, just something weird in the 4.9 kernel.

    This is also another confirmation that the problem is not in Kodi or LibreElec.


  • Given that I also see corruption in the largest of mythtv's mysql database files (updated while recordings are in progress and hosted by me on a USB sitck) when using 4.9 based kernels (although much less frequently than the artifacts in recorded HD video), my money's on changes to the USB subsystem / interrupt priority rather than any tweaks to V4L code:

    -rw-rw---- 1 mysql mysql 331575000 Jun 3 22:08 /var/lib/mysql/mythconverg/recordedseek.MYD
    -rw-rw---- 1 mysql mysql 308227072 Jun 3 22:08 /var/lib/mysql/mythconverg/recordedseek.MYI

    I was getting rather too familiar with myisamchk -o as a command, needing to run it at least 3-4 times a month.

    Since switching back to 4.6 / 4.8 based kernels, I've not needed to use it once.

  • Ok, some further investigations. I've moved from the pi2 to a core2 x86 PC, using the same DVB-T2 USB stick.

    (To run a vanilla kernel and therefore get closer to the commit which introduced the problem)

    blackbox linux # git remote -v

    origin git://http://git.kernel.org/pub/scm/linux/…inux-stable.git

    I found a suspicious looking commit:

    commit e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c

    Merge tag 'usb-4.9-rc1' of git://http://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

    Pull usb/phy/extcon updates from Greg KH:

    "Here is the big USB, and PHY, and extcon, patchsets for 4.9-rc1....

    So I built two kernels.

    One based on e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c and one based on the previous commit e6dce825fba05f447bd22c865e27233182ab3d79

    Based on e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c

    Within a few seconds of viewing BBC1 HD, there are the artifacts again.

    Based on e6dce825fba05f447bd22c865e27233182ab3d79

    no artifacts seen in about 5 minutes of viewing BBC1 HD

    Maybe gregkh can help identify which of the many changes it contains is the most likely culprit?

  • metaron good work!

    That provides evidence that it is an upstream linux kernel bug that affects multiple platforms.

    Reporting upstream may be useful: Reporting bugs for the Linux kernel

    You have identified a merge commit that seems responsible which is helpful, but it is a huge merge of 343 commits.

    You can view these commits with:

    Code
    git log  e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c^..e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c

    I think you can narrow it down to a single commit with:

    Code
    dc4@dc4-XPS13-9333:~/projects/linux$ git bisect start
    dc4@dc4-XPS13-9333:~/projects/linux$ git bisect good e6dce825fba05f447bd22c865e27233182ab3d79
    dc4@dc4-XPS13-9333:~/projects/linux$ git bisect bad e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c
    Bisecting: 184 revisions left to test after this (roughly 8 steps)

    Build this kernel and check if it is good/bad. Then run either "git bisect good" or "git bisect bad" and it will provide you with a new commit to build from. After 8 builds you should have the answer what the specific commit was.

  • @popcormix , following your instructions I get:

    Code
    blackbox linux # git bisect start
    blackbox linux # git bisect good e6dce825fba05f447bd22c865e27233182ab3d79
    blackbox linux # git bisect bad e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c
    Bisecting: 14 revisions left to test after this (roughly 4 steps)
    [2ad9d544f2497a7bf239c34bd2b86fd19683dbb5] cdc-acm: hardening against malicious devices

    Which seems slightly less than the 184 steps you were expecting expecting...

    Code
    git log  e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c^..e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c

    Only shows 31 individual commits. Might it be because I cloned with --depth=1 then ran --deepen=100?

    Anyhow will see what this bisecting lark turns up.

  • e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c is the first bad commit

    which is what I knew already ... ??? ...

    ab21b63 HEAD@{0}: checkout: moving from decc5360f23e9efe0252094f47f57f254dcbb3a9 to ab21b63e8aedfc73565dd9cdd51eb338341177cb

    decc536 HEAD@{1}: checkout: moving from a2f195a73eba807006fb0cb882ecb552c06eea00 to decc5360f23e9efe0252094f47f57f254dcbb3a9

    a2f195a HEAD@{2}: checkout: moving from e8624859dde2ad07633dac7ec86629a516411ea1 to a2f195a73eba807006fb0cb882ecb552c06eea00

    e862485 HEAD@{3}: checkout: moving from 2ad9d544f2497a7bf239c34bd2b86fd19683dbb5 to e8624859dde2ad07633dac7ec86629a516411ea1

    2ad9d54 HEAD@{4}: checkout: moving from post-usb-change to 2ad9d544f2497a7bf239c34bd2b86fd19683dbb5

    Any idea what I'm doing wrong?

  • Code
    blackbox linux #git clone --unshallow
    ....
    blackbox linux # git bisect start
    Already on 'post-usb-change'
    blackbox linux # git bisect good e6dce825fba05f447bd22c865e27233182ab3d79
    blackbox linux # git bisect bad e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c
    Bisecting: 184 revisions left to test after this (roughly 8 steps)
    [54a2ec67f1db62a763f57b7f8f2e82874f5f358b] usb: ohci: Allow ohci on omap5 also

    That looks more like it....

  • I've also noticed that running git status or git bisect goot/bad, which results in 100% dual core cpu usage and significant disk access, while watching video is a good confirmation (the mythtv video storage directory and kernel git repository are on the same drive).

    • When I've got a 'bad' version I get significant video artifacts as soon as disk access starts.
    • When I've got a good version I get interrupted playback but no artifacts and the audio having got very out of sync eventually catches upwithout any noticeable video corruption.

    What's confusing me at the moment is that I can't currently get any artifacts to appear with 54a2ec67f1db62a763f57b7f8f2e82874f5f358b. I've reset the bisect and am trying a fresh build based on this commit to see if that shows the artifacts I expect, but am a bit dubious that the bisected builds actually worked all the time (several of the 'good' builds were rather quick compiles without much if any content which isn't what I was expecting)

    Watch this space....

  • 54a2ec67f1db62a763f57b7f8f2e82874f5f358b is not going to be the commit that caused the problem (it only affects the

    SOC_OMAP5 platform which isn't the Pi or PC).

    If you ever report the wrong answer to git bisect good/bad then it will identify the wrong commit.

    The occasional quick compiles are expected. Especially after a few iterations the number of commits jumped will be small and if these don't affect header files the amount of rebuilding may be small.

  • You can run "git bisect log" to see what commits you reported as good/bad.

    You might want to double check if you reported bad to 54a2ec67f1db62a763f57b7f8f2e82874f5f358b or an older commit.

  • Well I didn't trust the results I got last time so I've tried to repeat the process again from scratch:

    Code
    git bisect start
    git bisect good e6dce82
    git bisect bad e6445f5

    But I got all the way through without finding a single bad build.

    Obviously what I thought was a guaranteed environment to reproduce the problem quickly isn't.

    I did mark some of the earlier builds as bad the first time around, but unfortunately didn't make an independent record of the hashes involved.

    (I have this time, but none appear to have failed...)

    Is there a log somewhere I might be able to resurrect? (git bisect log obviously only shows the most recent bisect)