LE11 Nightly - Samba server freezes, including logs - NTFS3 KERNEL DRIVER IS SLOW

  • not sure what commands you actually use to mount and which are there due defaults, I would try it with a bare minimum

    vers=3.1.1 -> vers=3.0

    cache=strict

    serverino

    mapposix

  • not sure what commands you actually use to mount and which are there due defaults, I would try it with a bare minimum

    vers=3.1.1 -> vers=3.0

    cache=strict

    serverino

    mapposix

    I have previously tried downgrading vers progressively down to 2.0, with identical results.

    I can try with th other options removed.

    BTW, the mount options are defaults from DietPi.

  • and which are there due defaults

    from my Fedora 35 box:

    testparm -s => what is set in *my* smb.conf

    testparm -vs => what is set as default plus what is set in *my* smbconf

    How much you want to bet that the error won't crop up for hours, or at least before the /storage partition of my SD card is full?

    such bets are illegal (correct wording ?) cause only *you* know how big your disk is ! ;)

  • such bets are illegal (correct wording ?) cause only *you* know how big your disk is ! ;)

    /dev/mmcblk0p2 13.5G 2.1G 11.4G 16% /storage

    kodiplayer:~ # ls -lah log.10.25.25.1

    -rw-r--r-- 1 root root 1.9G Feb 14 16:23 log.10.25.25.1

    No drops... I've tried rebooting both machines a few time... Grrrrr.... Sad, considering that it was dropping constantly while I was setting up the client specific logging!

    [MANY HOURS LATER] I'm going to believe that logging fixes the problem at this rate! Still no drop, at, all... Solution: client specific log at level 10 redirected to /dev/null! Only joking....

    Edited 2 times, last by camelreef (February 14, 2022 at 9:19 PM).

  • Ah!

    On the client:

    Code
    Feb 14 21:46:10 grabber kernel: CIFS: Attempting to mount //10.25.25.2/content
    Feb 14 21:55:20 grabber kernel: CIFS: VFS: \\10.25.25.2 sends on sock 00000000d18678b6 stuck for 15 seconds
    Feb 14 21:55:20 grabber kernel: CIFS: VFS: \\10.25.25.2 Error -11 sending data on socket to server
    Feb 14 21:58:32 grabber kernel: CIFS: VFS: \\10.25.25.2 has not responded in 180 seconds. Reconnecting...
    Feb 14 21:58:58 grabber kernel: CIFS: VFS: \\10.25.25.2 sends on sock 000000009546efff stuck for 15 seconds
    Feb 14 21:58:58 grabber kernel: CIFS: VFS: \\10.25.25.2 Error -11 sending data on socket to server
    Feb 14 21:59:27 grabber kernel: CIFS: VFS: \\10.25.25.2 sends on sock 0000000022e8f049 stuck for 15 seconds
    Feb 14 21:59:27 grabber kernel: CIFS: VFS: \\10.25.25.2 Error -11 sending data on socket to server

    On the server:

    https://youplala.net/~nico/log.10.25.25.1

    The two machines agree on time (NTP).

    Again, it's as if smbd is taking a break without noticing...

  • So, guys, any indication about generating some useful data about smbd? Some external monitoring, as getting anything internally appears to be hard...

    I'll take the previous logs to the Samba guys in a bug report, see what they say. I'll add the link here when done.

  • Can you believe that I have not yet been offered a Samba bugzilla account or got any feedback from them after 3 attempts?

    Also, the Samba 4.16 inclusion in LE11 must have taken a back seat in favour of more pressing matters.

  • Can you believe that I have not yet been offered a Samba bugzilla account or got any feedback from them after 3 attempts?

    if there bugsystem is broken just try the mailing list

    Samba Bug Report HOWTO

    its sadly quite normal that linux stuff has no userfriendly way to report bugs etc

    sometimes mailinglists are the only way, welcome to 1990 :)

  • if there bugsystem is broken just try the mailing list

    https://www.samba.org/samba/bugreports.html

    its sadly quite normal that linux stuff has no userfriendly way to report bugs etc

    sometimes mailinglists are the only way, welcome to 1990 :)

    See, your negativity shamed them enough to finally send me an email and allow me to open an account! ^^

    14988 – smbd pauses activity for 10s of seconds before resuming, silently

    Let's see what happens now

  • Can you believe that I have not yet been offered a Samba bugzilla account or got any feedback from them after 3 attempts?

    Also, the Samba 4.16 inclusion in LE11 must have taken a back seat in favour of more pressing matters.

    ;) on my work list … not forgotten but kernel 5.16.11 and gcc / glib and binutils trumped it. ETA on 4.16 in nightlies would be mid/late march at this stage. Waiting till 4.13 goes EOL and I push the final 4.13 into LE11/LE10 - then can drop 4.16 in. There is still compile error (probably I need to update the patch - but only did the basics on the PR to get it unpacking and compiling as my WIP)

  • Oooooh.... Interesting turn!

    The Samba guys noticed that the disk subsystem was very slow and offered options to set.

    Right... It's fine adapting smbd to a slow disk... But is it really slow?

    So, let test the disk subsystem!

    LE11 RPi 4 system

    That is indeed quite poor! This is a new 5TB WD 2.5" USB 3 external HDD, one NTFS partition mounted using ntfs3, connected to one of the USB 3 sockets of the Pi 4.

    Just for perspective gathering, I've done a perf test on another Linux box of mine, with a much older 1 TB WD 2.5" USB 3 external HDD, one NTFS partition mounted using ntfs3g, so quite similar:

    Reference system

    So, that's read speed.

    Let's take a look at writing and reading on the actual file system:

    LE11 RPi 4 system

    Code
    kodiplayer:/var/media/content # sync && echo 3 > /proc/sys/vm/drop_caches && dd if=/dev/zero of=testfile bs=128k count=16k && ls -lah testfile && dd if=testfile of=/dev/null bs=128k
     && rm testfile
    16384+0 records in
    16384+0 records out
    2147483648 bytes (2.0GB) copied, 186.714454 seconds, 11.0MB/s
    -rw-r--r--    1 root     root        2.0G Feb 24 10:24 testfile
    16384+0 records in
    16384+0 records out
    2147483648 bytes (2.0GB) copied, 274.038229 seconds, 7.5MB/s

    Reference system

    Code
    root@rastaman:/mnt/Blue-1TB# sync && echo 3 > /proc/sys/vm/drop_caches && dd if=/dev/zero of=testfile bs=128k count=16k && ls -lah testfile && dd if=testfile of=/dev/null bs=128k && rm testfile
    16384+0 records in
    16384+0 records out
    2147483648 bytes (2.1 GB, 2.0 GiB) copied, 27.9964 s, 76.7 MB/s
    -rwxrwxrwx 1 nico root 2.0G Feb 24 10:21 testfile
    16384+0 records in
    16384+0 records out
    2147483648 bytes (2.1 GB, 2.0 GiB) copied, 42.7862 s, 50.2 MB/s

    Well... That's not glorious for the LE11 RPi 4!

    Remember that this is a H/W rev. 1.5 RPi 4 (running nightly-20220219-ae4b7da)

    . I have a carbon copy system in production elsewhere, but with H/W Rev. 1.2 (running LE11 nightly-20220120-f7f2fd5, I'm a bit scared of upgrading it, it works):

    Code
    LibreELEC:/var/media/Media # sync && echo 3 > /proc/sys/vm/drop_caches && dd if=/dev/zero of=testfile bs=128k count=16k && ls -lah testfile && dd if=testfile of=/dev/null bs=128k &&
     rm testfile
    16384+0 records in
    16384+0 records out
    2147483648 bytes (2.0GB) copied, 17.060671 seconds, 120.0MB/s
    -rw-r--r--    1 root     root        2.0G Feb 24 10:36 testfile
    16384+0 records in
    16384+0 records out
    2147483648 bytes (2.0GB) copied, 0.748839 seconds, 2.7GB/s

    A striking difference!

    I'm probably going to upgrade the production system to the same LE11 nightly build as the problematic system. I'm not too thrilled, but it will help knowing if it's a software issue...

    So, maybe not a Samba issue, but a disk I/O performance issue...

  • you mount a NTFS disk at linux and use it as native device? thats basically a NONO since a long time because the NTFS fuse driver is slow and not feature complete

    this may change with more recent kernels where finally a proper NTFS driver was merged

    can you try a ext4 formated disk ?

  • you mount a NTFS disk at linux and use it as native device? thats basically a NONO since a long time because the NTFS fuse driver is slow and not feature complete

    this may change with more recent kernels where finally a proper NTFS driver was merged

    can you try a ext4 formated disk ?

    I do not understand your comment.

    I use an NTFS formatted disk for practical reasons, as it is meant to be plugged regularly on Windows machines.

    Also, LE11 uses ntfs3, which is, I believe, the proper NTFS driver you mention.

    Finally, my reference system uses the old FUSE ntfs3g driver, and gets acceptable performance. Another system uses the same ntfs3 driver and gets acceptable performance

    I will try an ext4 formatted disk, but it will just be a data point, I need NTFS for my purpose (I also believe that NTFS disks usage is to be expected around LE).

    I will also try another NTFS formatted disk I have spare and compare perfs, as another data point.

  • 5TB WD 2.5" USB 3 external HDD

    if it's an "WD_BLACK P10 Game Drive" (WDBA3A0050BBK-WESN):

    - have CMR and

    - support TRIM (=> untrimmed => write performance ?)

    NUC8, yesterdays nightly, the above disk with ext4

    ===========================================

    hdparm --direct -tT /dev/sdb

    /dev/sdb:

    Timing O_DIRECT cached reads: 586 MB in 2.00 seconds = 293.13 MB/sec

    Timing O_DIRECT disk reads: 350 MB in 3.00 seconds = 116.54 MB/sec


    LibreELEC:/var/media/BackupHD # dd if=/dev/zero of=tempfile bs=1MB count=10240

    10240000000 bytes (9.5GB) copied, 78.312656 seconds, 124.7MB/s

    LibreELEC:/var/media/BackupHD # dd if=tempfile of=/dev/zero bs=1MB count=10240

    10240000000 bytes (9.5GB) copied, 103.853891 seconds, 94.0MB/s

    ===

    man hdparm:

    --direct

    Use the kernel´s "O_DIRECT" flag when performing a -t timing test. This bypasses the page cache, causing the reads to go directly from the

    drive into hdparm's buffers, using so-called "raw" I/O. In many cases, this can produce results that appear much faster than the usual page

    cache method, giving a better indication of raw device and driver performance.

    sudo hdparm --direct -tT /dev/sdX

    might (I'm unsure) make the "echo 3 > /proc/sys/vm/drop_caches" superfluous ?