RPi hangs sometimes, how to auto-reboot it?

  • An RPi2 with LE7 and TVH is always on. It records DVB-T to an external disk. Prior to recording, it mounts disk, and affter a recording, disk is unmounted since RPi will surelly hang if external disk goes to sleep while mounted. That seems to work perfectly well for years.

    Form time to time, once in 3-4 months, RPi will hang up and I can not find why. I can ping it. I can not SSH to it but it will actively refuse telent to it. My router will show RPi as active and alive. It is a remote machine co I can not test if Kodi is working over HDMI. I do not know if it works partialy or not at all, but seems to me it is not absolutely hang up / dead. The rest of that network works perfeclty.

    The script running at that RPi every 12 minutes (which controls it) will apparantly stop working. I can see that this script does not connect (ssh) to my router any more nor make any log entries to the local memory card. I can not connect to TVH and TVH will not make any recordings from that point on. My only way out is to power cycle the RPi - after that all is well again.

    I have just made the RPi cron entry to test if it can sucesfully ssh to the router and if not, reboot the RPi. But, will that cron run when the problem arises?

    So, is there something internal which I can use to test and reboot RPi if it looses the connection (can not ssh) to a router?

    Is there a better solution?

  • Over time we found some memory fragmentation issues that affect long-uptime systems. Check free RAM, and if that's the issue, start planning an update to 9.0 as that's when it'll be resolved.

    OK, that seems very plausible since it typically takes several months for a problem to develop, and my script runs every 12 minutes. I thought the same as I have similar problem with my router, also in about 90 days time. But, for now, I want to resolve it on the current version. I dislike upgrades if I I do not have clear reasons.

    What to monitor? I am perfectly happy to monitor something and reboot if it gets over certain amount.

    Largest memory spent and only above 1mb are:

    top

    Mem: 270904K used, 483752K free,

    527m 71.3 0 0.8 /usr/lib/kodi/kodi.bin --standalon

    287m 38.8 1 0.0 {tvheadend} /storage/.kodi/addons/

    ps does not show me memory, but I could use top like this:

    top -n1 | grep kodi.b[i]n

    top -n1 | grep tvheade[n]d

    or is it better to monitor just the free memory, so if it drops below something, reboot.

    top -n1 | grep fre[e] | cut -d" " -f4

    or probably better:

    mem_free=$(( $( free | awk 'NR==2 {print $4}' ) / 1024 )); echo $mem_free

    What should I set for a reboot limit? 100kb? Or is this not a good tactics to capture this problem?

    Or simply do a preemptive RPi reboot say once monthly. But, I would prefer to know exactly why I am doing it than to simply reboot without learning anything.

    • Official Post

    I have a similar setup to you albeit a RPi3 and LE 8.2.3 with DVB-T via Xbox TV Tuner and I haven't had any issues with locking. I do however have Kodi shutdown and HDMI + LEDs turned off (If that makes any difference).

    If you don't really need Kodi running (Not needed for TVH) then shut it down - it may solve your issue.

    Also I reduced GPU memory to 112 and used nmon (In the LE system repository) to monitor my box periodically, which leaves around 600M free.

    There is also a Raspbian GPIO switch shutdown available (Which I have tested in Raspbian Stretch - and works well) but haven't tested on LE which might allow a clean shutdown/Reboot if you need it.

    There is also the Raspbian watchdog service that does work under Raspbian, but I believe this has caused issues with LE and is no longer implemented because of problems.

  • As for monitoring the free memory, this may not be a good way to monitor for this particular problem. Currently, I have one active recording, and free gives:

    Code
    total       used       free     shared    buffers     cached
    Mem:        754656     729016      25640       7624       1808     503888
    -/+ buffers/cache:     223320     531336

    So if I monitor the first row it will go to just 24k and work perfectly. If I monitor without buffers/cache, it will remain very high. If someone has a better idea, I am open.

  • If you don't really need Kodi running (Not needed for TVH) then shut it down - it may solve your issue.

    This could also be a good idea which may be simply implemented when applicable.

    But, I am afraid my only recourse will be to schedule e.g. montly reboot when RPi is free.