NFS Down when trying to play a video

  • Hi,

    I am running the latest LE (10.0.0.2) with the latest FW for my Rpi4 (with the uHAT and TVH installed)

    The problem I am facing is that when I start watching a recorded video or when I am watching it (typically, max 10 minutes after I start), everything stops.

    In the logs I can see many messages stating that NFS server is down (I record videos to my NAS):

    Code
    Apr 13 22:29:42 LibreELEC kernel: nfs: server 192.168.1.120 not responding, still trying
    Apr 13 22:29:42 LibreELEC kernel: nfs: server 192.168.1.120 not responding, still trying
    Apr 13 22:29:42 LibreELEC kernel: nfs: server 192.168.1.120 not responding, still trying

    I don't have any WiFi issues, at least, by monitoring the WiFi I see connection is seen as UP, with a good DBm. What I saw some days ago is that the MAC address of the NAS was seen as "incomplete" in the ARP table, so I created a script that would bring wifi down and up when this would occur. It works but I still get disruption of the service.

    Yesterday, though, the MAC was there in the table and WiFi was up, as you can see here below (the fields show Time, DBm, connection status, MAC of the NAS, MAC of another Raspberry I have: I send statistics to a Grafana server located there)


    The situation recovered only at 4M, when the MAC went down and I automatically reconnected to the network (see the reason=3)

    Code
    Thu Apr 14 04:45:32 CEST 2022,-47,yes,24:5e:be:3a:9e:dd,b8:27:eb:ca:5d:2d
    Thu Apr 14 04:45:47 CEST 2022,-47,yes,24:5e:be:3a:9e:dd,b8:27:eb:ca:5d:2d
    Thu Apr 14 04:46:03 CEST 2022,,,,
    Thu Apr 14 04:46:18 CEST 2022,-47,yes,24:5e:be:3a:9e:dd,b8:27:eb:ca:5d:2d
    Thu Apr 14 04:46:33 CEST 2022,-47,yes,24:5e:be:3a:9e:dd,b8:27:eb:ca:5d:2d
    Code
    Apr 14 04:46:02 LibreELEC wpa_supplicant[536]: wlan0: CTRL-EVENT-DISCONNECTED bssid=dc:39:6f:f2:32:9d reason=3 locally_generated=1
    Apr 14 04:46:09 LibreELEC kernel: nfs: server 192.168.1.120 OK
    Apr 14 04:46:11 LibreELEC tvheadend[31042]: http: Starting HTTP server 0.0.0.0:9981
    Apr 14 04:46:11 LibreELEC tvheadend[31042]: htsp: Starting HTSP server 0.0.0.0:9982

    Of course, I couldn't watch anything for the entire night, when this happens, the entire Kodi gets stuck basically: if you go to System Info and go to the Network part, the GUI freezes and CEC stops responding as well.

    This also interrupted the recording of the program I wanted to watch, so I lost it. It also stopped sending data to the Grafana Server, which has a different IP Address and does not use NFS (of course, it's an HTTP server)

    Note that when TVH records the video (towards the very same NAS), I never lose a bit, and it's a 220 minutes recording. But very often (or maybe always), when I try to play it, it all gets wrong and stuck. If my script works, then I have problems for 10 minutes, but yesterday it was a no-go

    At a certain point in time, when the GUI was stuck, I also got this message in the logs, but I didn't see any direct consequences - maybe this is not relevant and just a consequence of all the rest

    Code
    Apr 13 23:54:18 LibreELEC kernel: tvh:upnp invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

    It's not a temperature problem because I have a fan shim controlled by a Kodi add-on (which worked even when the GUI was frozen):

    Code
    2022-04-13 22:52:40.726 T:959      INFO <general>: 66.2Enabling fan! temp:66.2
    2022-04-13 22:54:42.977 T:959      INFO <general>: 51.6 Disabling fan! temp:51.6
    2022-04-13 22:59:23.112 T:959      INFO <general>: 66.2Enabling fan! temp:66.2
    2022-04-13 23:01:15.162 T:959      INFO <general>: 51.6 Disabling fan! temp:51.6
    2022-04-13 23:06:25.855 T:959      INFO <general>: 66.2Enabling fan! temp:66.2
    2022-04-13 23:08:12.813 T:959      INFO <general>: 51.1 Disabling fan! temp:51.1
    2022-04-13 23:13:03.121 T:959      INFO <general>: 66.2Enabling fan! temp:66.2
    2022-04-13 23:15:10.446 T:959      INFO <general>: 51.1 Disabling fan! temp:51.1

    Is there a way to solve the NFS issue? Did anyone of you experience this problem?

    Note that the server is not down because the other Rpi is connected to it and I don't see any issues there: I send influxdb data every 30 seconds and I didn't lose any single bucket yesterday. I also use that NAS with my other Kodi on a FireTV and I have never ever experienced any issues (it uses WiFi connectivity as well). And on top, recording from LE (TVH) to the NAS always works.

    Luca

  • I have taken some pcaps from my router, to see what's going on.

    What I see is that, all of a sudden, while the WiFi is still active, 802.11 packets between the router and the rpi4 are sent and received and the IP address is maintained (so, apparently, it's all good), nobody can get an ARP answer from the rpi4. The router tries, the other raspberry tries, with no success.

    It all starts at 22:38, when the capture begins and I try to connect to my NFS server. At 22:41 the vpn service detects something is wrong and connmand tears down the routes, but the WiFi gets disconnected only at 22:47. As soon as it reconnects, ARP tables are correctly populated and everything is back to normal.

    Here you see the router sending an ARP request to the RPI4, getting no answer. Also the other Raspberry tries, with no luck

    Wifi gets disconnected at 22:47 and the rpi4 reconnects, ARP is populated

    At this point in time, the NAS (.120) is also discovered and I can start watching the TV program (10 minutes after my initial atrempt)

    So it looks like the WiFi is up but ARP requests fail, how can it be? And how could I write a script that manually tears WiFi down and up if I see something like that?

    I tried by checking the ARP table of the RPI4: if I don't see the NAS IP, I will reconnect, but this is not working because I believe ARP ageing is about 5 minutes, so I arrive quite late

    I tried with ping, but it didn't work either - well, I tried only once and in that case I was getting a continuous rebuffering, with PVR disconnecting and reconnecting. From the logs, ping was always successful, so probably this is not working either.

    I cannot think of other ideas. So far I am trying storing my TV programs on a USB stick connected to the rpi4, I did some tests and apparently it's working, but still, I would like to find a solution.

  • Well, I have bought an USB Pendrive, I will store the data there, without relying on my LAN and on the WiFi anymore.

    I have lost my battle, but I have gained in usability.

    Case closed.

  • What I have seen is that there is a problem on the WiFi whenever I use it "heavily".

    For instance, doing an scp on my mac from my Rpi4 in order to download a video file from my LE server made the wifi crash when transfer rate was 18%. And there is nothing I can do, it will always fail.

    So I guess the problem is in the WiFi firmware

    Code
    Lucas-iMac:Movies luca$ scp [email protected]:/var/media/DVR/RegistrazioniTV/CLV/CLV2023-01-25.ts .
    CLV2023-01-25.ts                                                                                                                                                                              18% 1448MB  28.9KB/s - stalled -client_loop: send disconnect: Broken pipe
    scp: Connection closed
  • Still interested in this network issue, I found an issue by chance:

    if I copy a large file to my NAS, after a while the connection will drop. I see very high 1m load usage and free mem goes to zero.

    If I nohup copy the same file to the NAS, I still lose connectivity but the file is copied suucesfully (6GB long).

    I think this is the problem: when kodi opens the file, the throughput is kind of too big and the rpi4 hangs.

    Do you know how this could be solved? Some network settings in the rpi?

    Thanks