Alright
I just wanted to clarify what I did (and did not) to get to 4.9 since there seemed to be some confusion how I (and someone else) got there.
I'll get back here in a few hours with some results of the tests on one or both of my RPIs.
Alright
I just wanted to clarify what I did (and did not) to get to 4.9 since there seemed to be some confusion how I (and someone else) got there.
I'll get back here in a few hours with some results of the tests on one or both of my RPIs.
rpi-update bcc6146e102d85b1aa214855ad7aae278d3bd269
gets me:
Linux pi2 4.9.0-v7+ #939 SMP Thu Dec 15 18:07:48 GMT 2016 armv7l ARMv7 Processor rev 5 (v7l) BCM2835 GNU/Linux
And the artifacts are back.
rpi-update 7a47836821b92efa569500b3382b1812082e42d3
gets me:
Linux pi2 4.8.13-v7+ #937 SMP Fri Dec 9 17:45:13 GMT 2016 armv7l ARMv7 Processor rev 5 (v7l) BCM2835 GNU/Linux
No artifacts. Sticking with this build for now.
Let me know if/when there's a test build / git hash to localize things further.
I can confirm everything works at it should with the RPi with kernel 4.4 (original Raspbian)
Next test: I have connected the DVBSky to the other RPi (with kernel reverted to 4.8) and have scheduled some recordings on it.
Results will be in tomorrow around noon. It's past midnight here, and I have to call it a day (night) for now.
UPDATE 1: The reverted 4.8-kernel also works as expected: 2 recordings, 0 errors.
UPDATE 2:
pi@thuban:~ $ uname -a
Linux thuban 4.8.13-v7+ #937 SMP Fri Dec 9 17:45:13 GMT 2016 armv7l GNU/Linux
pi@thuban:~ $ sudo rpi-update bcc6146e102d85b1aa214855ad7aae278d3bd269
pi@thuban:~ $ sudo reboot
pi@thuban:~ $ uname -a
Linux thuban 4.9.0-v7+ #939 SMP Thu Dec 15 18:07:48 GMT 2016 armv7l GNU/Linux
And now the errors are back.
I think my situation is clear: There's nothing wrong with my hardware or config, just something weird in the 4.9 kernel.
This is also another confirmation that the problem is not in Kodi or LibreElec.
Given that I also see corruption in the largest of mythtv's mysql database files (updated while recordings are in progress and hosted by me on a USB sitck) when using 4.9 based kernels (although much less frequently than the artifacts in recorded HD video), my money's on changes to the USB subsystem / interrupt priority rather than any tweaks to V4L code:
-rw-rw---- 1 mysql mysql 331575000 Jun 3 22:08 /var/lib/mysql/mythconverg/recordedseek.MYD
-rw-rw---- 1 mysql mysql 308227072 Jun 3 22:08 /var/lib/mysql/mythconverg/recordedseek.MYI
I was getting rather too familiar with myisamchk -o as a command, needing to run it at least 3-4 times a month.
Since switching back to 4.6 / 4.8 based kernels, I've not needed to use it once.
Ok, some further investigations. I've moved from the pi2 to a core2 x86 PC, using the same DVB-T2 USB stick.
(To run a vanilla kernel and therefore get closer to the commit which introduced the problem)
blackbox linux # git remote -v
origin git://http://git.kernel.org/pub/scm/linux/…inux-stable.git
I found a suspicious looking commit:
commit e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c
Merge tag 'usb-4.9-rc1' of git://http://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull usb/phy/extcon updates from Greg KH:
"Here is the big USB, and PHY, and extcon, patchsets for 4.9-rc1....
So I built two kernels.
One based on e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c and one based on the previous commit e6dce825fba05f447bd22c865e27233182ab3d79
Based on e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c
Within a few seconds of viewing BBC1 HD, there are the artifacts again.
Based on e6dce825fba05f447bd22c865e27233182ab3d79
no artifacts seen in about 5 minutes of viewing BBC1 HD
Maybe gregkh can help identify which of the many changes it contains is the most likely culprit?
metaron good work!
That provides evidence that it is an upstream linux kernel bug that affects multiple platforms.
Reporting upstream may be useful: Reporting bugs for the Linux kernel
You have identified a merge commit that seems responsible which is helpful, but it is a huge merge of 343 commits.
You can view these commits with:
I think you can narrow it down to a single commit with:
dc4@dc4-XPS13-9333:~/projects/linux$ git bisect start
dc4@dc4-XPS13-9333:~/projects/linux$ git bisect good e6dce825fba05f447bd22c865e27233182ab3d79
dc4@dc4-XPS13-9333:~/projects/linux$ git bisect bad e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c
Bisecting: 184 revisions left to test after this (roughly 8 steps)
Build this kernel and check if it is good/bad. Then run either "git bisect good" or "git bisect bad" and it will provide you with a new commit to build from. After 8 builds you should have the answer what the specific commit was.
Thanks for the instructions popcornmix. Will take a look this week and see if I can localize it further
@popcormix , following your instructions I get:
blackbox linux # git bisect start
blackbox linux # git bisect good e6dce825fba05f447bd22c865e27233182ab3d79
blackbox linux # git bisect bad e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c
Bisecting: 14 revisions left to test after this (roughly 4 steps)
[2ad9d544f2497a7bf239c34bd2b86fd19683dbb5] cdc-acm: hardening against malicious devices
Which seems slightly less than the 184 steps you were expecting expecting...
Only shows 31 individual commits. Might it be because I cloned with --depth=1 then ran --deepen=100?
Anyhow will see what this bisecting lark turns up.
e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c is the first bad commit
which is what I knew already ... ??? ...
ab21b63 HEAD@{0}: checkout: moving from decc5360f23e9efe0252094f47f57f254dcbb3a9 to ab21b63e8aedfc73565dd9cdd51eb338341177cb
decc536 HEAD@{1}: checkout: moving from a2f195a73eba807006fb0cb882ecb552c06eea00 to decc5360f23e9efe0252094f47f57f254dcbb3a9
a2f195a HEAD@{2}: checkout: moving from e8624859dde2ad07633dac7ec86629a516411ea1 to a2f195a73eba807006fb0cb882ecb552c06eea00
e862485 HEAD@{3}: checkout: moving from 2ad9d544f2497a7bf239c34bd2b86fd19683dbb5 to e8624859dde2ad07633dac7ec86629a516411ea1
2ad9d54 HEAD@{4}: checkout: moving from post-usb-change to 2ad9d544f2497a7bf239c34bd2b86fd19683dbb5
Any idea what I'm doing wrong?
Only shows 31 individual commits. Might it be because I cloned with --depth=1 then ran --deepen=100?
Yes, I don't think a shallow clone will have the required history.
I think you should clone the complete repo for this to work.
blackbox linux #git clone --unshallow
....
blackbox linux # git bisect start
Already on 'post-usb-change'
blackbox linux # git bisect good e6dce825fba05f447bd22c865e27233182ab3d79
blackbox linux # git bisect bad e6445f52d9c8b0e6557a45fa7d0e8e088d430a8c
Bisecting: 184 revisions left to test after this (roughly 8 steps)
[54a2ec67f1db62a763f57b7f8f2e82874f5f358b] usb: ohci: Allow ohci on omap5 also
That looks more like it....
blackbox linux # git bisect good
54a2ec67f1db62a763f57b7f8f2e82874f5f358b is the first bad commit
commit 54a2ec67f1db62a763f57b7f8f2e82874f5f358b
Author: Tony Lindgren <[email protected]>
Date: Fri Sep 9 10:58:24 2016 -0700
usb: ohci: Allow ohci on omap5 also
With LPAE config we don't have omap3 or omap4 selected for
omap5 variants.
Signed-off-by: Tony Lindgren <[email protected]>
Acked-by: Alan Stern <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
:040000 040000 1e547c9b538119059712bcf6ba103eb639cd7a42 ba064f1f1d062090d03892dd0ef585c356343703 M drivers
Display More
I've also noticed that running git status or git bisect goot/bad, which results in 100% dual core cpu usage and significant disk access, while watching video is a good confirmation (the mythtv video storage directory and kernel git repository are on the same drive).
What's confusing me at the moment is that I can't currently get any artifacts to appear with 54a2ec67f1db62a763f57b7f8f2e82874f5f358b. I've reset the bisect and am trying a fresh build based on this commit to see if that shows the artifacts I expect, but am a bit dubious that the bisected builds actually worked all the time (several of the 'good' builds were rather quick compiles without much if any content which isn't what I was expecting)
Watch this space....
54a2ec67f1db62a763f57b7f8f2e82874f5f358b is not going to be the commit that caused the problem (it only affects the
SOC_OMAP5 platform which isn't the Pi or PC).
If you ever report the wrong answer to git bisect good/bad then it will identify the wrong commit.
The occasional quick compiles are expected. Especially after a few iterations the number of commits jumped will be small and if these don't affect header files the amount of rebuilding may be small.
You can run "git bisect log" to see what commits you reported as good/bad.
You might want to double check if you reported bad to 54a2ec67f1db62a763f57b7f8f2e82874f5f358b or an older commit.
Well I didn't trust the results I got last time so I've tried to repeat the process again from scratch:
But I got all the way through without finding a single bad build.
Obviously what I thought was a guaranteed environment to reproduce the problem quickly isn't.
I did mark some of the earlier builds as bad the first time around, but unfortunately didn't make an independent record of the hashes involved.
(I have this time, but none appear to have failed...)
Is there a log somewhere I might be able to resurrect? (git bisect log obviously only shows the most recent bisect)
Is there a log somewhere I might be able to resurrect? (git bisect log obviously only shows the most recent bisect)
I think only the current git bisect log is saved.
OK, will try again with a more robust (and slower) approach; making recordings and reviewing them for artifacts afterwards.