LE11 Nightly - Samba server freezes, including logs - NTFS3 KERNEL DRIVER IS SLOW

**CvH** · February 14, 2022 at 11:23 AM

not sure what commands you actually use to mount and which are there due defaults, I would try it with a bare minimum

vers=3.1.1 -> vers=3.0

~~cache=strict~~

~~serverino~~

~~mapposix~~

camelreef · February 14, 2022 at 12:45 PM

Quote from CvH

not sure what commands you actually use to mount and which are there due defaults, I would try it with a bare minimum
vers=3.1.1 -> vers=3.0
~~cache=strict~~
~~serverino~~
~~mapposix~~

I have previously tried downgrading vers progressively down to 2.0, with identical results.

I can try with th other options removed.

BTW, the mount options are defaults from DietPi.

GDPR-7 · February 14, 2022 at 4:14 PM

Quote from CvH

and which are there due defaults

from my Fedora 35 box:

testparm -s => what is set in *my* smb.conf

testparm -vs => what is set as default plus what is set in *my* smbconf

Quote from camelreef

How much you want to bet that the error won't crop up for hours, or at least before the /storage partition of my SD card is full?

such bets are illegal (correct wording ?) cause only *you* know how big your disk is !

camelreef · February 14, 2022 at 4:24 PM

Quote from JoeAverage

such bets are illegal (correct wording ?) cause only *you* know how big your disk is !

/dev/mmcblk0p2 13.5G 2.1G 11.4G 16% /storage

kodiplayer:~ # ls -lah log.10.25.25.1

-rw-r--r-- 1 root root 1.9G Feb 14 16:23 log.10.25.25.1

No drops... I've tried rebooting both machines a few time... Grrrrr.... Sad, considering that it was dropping constantly while I was setting up the client specific logging!

[MANY HOURS LATER] I'm going to believe that logging fixes the problem at this rate! Still no drop, at, all... Solution: client specific log at level 10 redirected to /dev/null! Only joking....

camelreef · February 14, 2022 at 10:09 PM

Ah!

On the client:

Code

Feb 14 21:46:10 grabber kernel: CIFS: Attempting to mount //10.25.25.2/content
Feb 14 21:55:20 grabber kernel: CIFS: VFS: \\10.25.25.2 sends on sock 00000000d18678b6 stuck for 15 seconds
Feb 14 21:55:20 grabber kernel: CIFS: VFS: \\10.25.25.2 Error -11 sending data on socket to server
Feb 14 21:58:32 grabber kernel: CIFS: VFS: \\10.25.25.2 has not responded in 180 seconds. Reconnecting...
Feb 14 21:58:58 grabber kernel: CIFS: VFS: \\10.25.25.2 sends on sock 000000009546efff stuck for 15 seconds
Feb 14 21:58:58 grabber kernel: CIFS: VFS: \\10.25.25.2 Error -11 sending data on socket to server
Feb 14 21:59:27 grabber kernel: CIFS: VFS: \\10.25.25.2 sends on sock 0000000022e8f049 stuck for 15 seconds
Feb 14 21:59:27 grabber kernel: CIFS: VFS: \\10.25.25.2 Error -11 sending data on socket to server

On the server:

https://youplala.net/~nico/log.10.25.25.1

The two machines agree on time (NTP).

Again, it's as if smbd is taking a break without noticing...

camelreef · February 16, 2022 at 3:38 PM

So, guys, any indication about generating some useful data about smbd? Some external monitoring, as getting anything internally appears to be hard...

I'll take the previous logs to the Samba guys in a bug report, see what they say. I'll add the link here when done.

GDPR-7 · February 16, 2022 at 7:13 PM

camelreef

heitbaum is preparing nightly to run last samba

[le11] samba: update to 4.16.y by heitbaum · Pull Request #6241 · LibreELEC/LibreELEC.tv

Draft PR of samba 4.16 - proposed next release of samba for LE11:master. Expect that we track samba releases within LE:master until freeze in preparation for…

github.com

camelreef · February 17, 2022 at 1:15 PM

Quote from JoeAverage

camelreef
heitbaum is preparing nightly to run last samba
https://github.com/LibreELEC/LibreELEC.tv/pull/6241

This is giving me hope! Thanks for the tip!

GDPR-7 · February 19, 2022 at 8:14 PM

is the link to the samba bug report available now ?

camelreef · February 21, 2022 at 3:57 PM

Quote from JoeAverage

is the link to the samba bug report available now ?

I wish!

I've sent the email explaining why I'd like a bugzilla account, and never heard back, twice... A bit of a weird process...

camelreef · February 23, 2022 at 9:27 AM

Can you believe that I have not yet been offered a Samba bugzilla account or got any feedback from them after 3 attempts?

Also, the Samba 4.16 inclusion in LE11 must have taken a back seat in favour of more pressing matters.

**CvH** · February 23, 2022 at 11:35 AM

Quote from camelreef

Can you believe that I have not yet been offered a Samba bugzilla account or got any feedback from them after 3 attempts?

if there bugsystem is broken just try the mailing list

Samba Bug Report HOWTO

its sadly quite normal that linux stuff has no userfriendly way to report bugs etc

sometimes mailinglists are the only way, welcome to 1990

camelreef · February 23, 2022 at 12:43 PM

Quote from CvH

if there bugsystem is broken just try the mailing list
https://www.samba.org/samba/bugreports.html
its sadly quite normal that linux stuff has no userfriendly way to report bugs etc
sometimes mailinglists are the only way, welcome to 1990

See, your negativity shamed them enough to finally send me an email and allow me to open an account!

14988 – smbd pauses activity for 10s of seconds before resuming, silently

Let's see what happens now

**heitbaum** · February 23, 2022 at 2:15 PM

Quote from camelreef

Can you believe that I have not yet been offered a Samba bugzilla account or got any feedback from them after 3 attempts?
Also, the Samba 4.16 inclusion in LE11 must have taken a back seat in favour of more pressing matters.

on my work list … not forgotten but kernel 5.16.11 and gcc / glib and binutils trumped it. ETA on 4.16 in nightlies would be mid/late march at this stage. Waiting till 4.13 goes EOL and I push the final 4.13 into LE11/LE10 - then can drop 4.16 in. There is still compile error (probably I need to update the patch - but only did the basics on the PR to get it unpacking and compiling as my WIP)

camelreef · February 24, 2022 at 10:41 AM

Oooooh.... Interesting turn!

The Samba guys noticed that the disk subsystem was very slow and offered options to set.

Right... It's fine adapting smbd to a slow disk... But is it really slow?

So, let test the disk subsystem!

LE11 RPi 4 system

Code

/dev/sda2 on /var/media/content type ntfs3 (rw,relatime,uid=0,gid=0,fmask=37777600133,iocharset=utf8)

kodiplayer:~ # hdparm -Ttv /dev/sda2

/dev/sda2:
 multcount     =  0 (off)
 readonly      =  0 (off)
 readahead     = 256 (on)
 geometry      = 4769073/64/32, sectors = 9767061504, start = 411648
 Timing cached reads:   2030 MB in  2.00 seconds = 1016.80 MB/sec
 Timing buffered disk reads: 162 MB in  3.01 seconds =  53.83 MB/sec

Display More

That is indeed quite poor! This is a new 5TB WD 2.5" USB 3 external HDD, one NTFS partition mounted using ntfs3, connected to one of the USB 3 sockets of the Pi 4.

Just for perspective gathering, I've done a perf test on another Linux box of mine, with a much older 1 TB WD 2.5" USB 3 external HDD, one NTFS partition mounted using ntfs3g, so quite similar:

Reference system

Code

/dev/sdb1 on /mnt/Blue-1TB type fuseblk (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,blksize=4096)

13 nico@rastaman:~$ sudo hdparm -Ttv /dev/sdb1
[sudo] password for nico: 

/dev/sdb1:
 multcount     =  0 (off)
 readonly      =  0 (off)
 readahead     = 256 (on)
 geometry      = 121597/255/63, sectors = 1953458174, start = 2
 Timing cached reads:   22072 MB in  1.99 seconds = 11092.93 MB/sec
 Timing buffered disk reads: 324 MB in  3.01 seconds = 107.58 MB/sec

Display More

So, that's read speed.

Let's take a look at writing and reading on the actual file system:

LE11 RPi 4 system

Code

kodiplayer:/var/media/content # sync && echo 3 > /proc/sys/vm/drop_caches && dd if=/dev/zero of=testfile bs=128k count=16k && ls -lah testfile && dd if=testfile of=/dev/null bs=128k
 && rm testfile
16384+0 records in
16384+0 records out
2147483648 bytes (2.0GB) copied, 186.714454 seconds, 11.0MB/s
-rw-r--r--    1 root     root        2.0G Feb 24 10:24 testfile
16384+0 records in
16384+0 records out
2147483648 bytes (2.0GB) copied, 274.038229 seconds, 7.5MB/s

Reference system

Code

root@rastaman:/mnt/Blue-1TB# sync && echo 3 > /proc/sys/vm/drop_caches && dd if=/dev/zero of=testfile bs=128k count=16k && ls -lah testfile && dd if=testfile of=/dev/null bs=128k && rm testfile
16384+0 records in
16384+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 27.9964 s, 76.7 MB/s
-rwxrwxrwx 1 nico root 2.0G Feb 24 10:21 testfile
16384+0 records in
16384+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 42.7862 s, 50.2 MB/s

Well... That's not glorious for the LE11 RPi 4!

Remember that this is a H/W rev. 1.5 RPi 4 (running nightly-20220219-ae4b7da)

. I have a carbon copy system in production elsewhere, but with H/W Rev. 1.2 (running LE11 nightly-20220120-f7f2fd5, I'm a bit scared of upgrading it, it works):

Code

LibreELEC:/var/media/Media # sync && echo 3 > /proc/sys/vm/drop_caches && dd if=/dev/zero of=testfile bs=128k count=16k && ls -lah testfile && dd if=testfile of=/dev/null bs=128k &&
 rm testfile
16384+0 records in
16384+0 records out
2147483648 bytes (2.0GB) copied, 17.060671 seconds, 120.0MB/s
-rw-r--r--    1 root     root        2.0G Feb 24 10:36 testfile
16384+0 records in
16384+0 records out
2147483648 bytes (2.0GB) copied, 0.748839 seconds, 2.7GB/s

A striking difference!

I'm probably going to upgrade the production system to the same LE11 nightly build as the problematic system. I'm not too thrilled, but it will help knowing if it's a software issue...

So, maybe not a Samba issue, but a disk I/O performance issue...

**CvH** · February 24, 2022 at 12:13 PM

you mount a NTFS disk at linux and use it as native device? thats basically a NONO since a long time because the NTFS fuse driver is slow and not feature complete

this may change with more recent kernels where finally a proper NTFS driver was merged

can you try a ext4 formated disk ?

camelreef · February 24, 2022 at 2:35 PM

Quote from CvH

you mount a NTFS disk at linux and use it as native device? thats basically a NONO since a long time because the NTFS fuse driver is slow and not feature complete
this may change with more recent kernels where finally a proper NTFS driver was merged
can you try a ext4 formated disk ?

I do not understand your comment.

I use an NTFS formatted disk for practical reasons, as it is meant to be plugged regularly on Windows machines.

Also, LE11 uses ntfs3, which is, I believe, the proper NTFS driver you mention.

Finally, my reference system uses the old FUSE ntfs3g driver, and gets acceptable performance. Another system uses the same ntfs3 driver and gets acceptable performance

I will try an ext4 formatted disk, but it will just be a data point, I need NTFS for my purpose (I also believe that NTFS disks usage is to be expected around LE).

I will also try another NTFS formatted disk I have spare and compare perfs, as another data point.

GDPR-7 · February 24, 2022 at 4:52 PM

Quote from camelreef

5TB WD 2.5" USB 3 external HDD

if it's an "WD_BLACK P10 Game Drive" (WDBA3A0050BBK-WESN):

- have CMR and

- support TRIM (=> untrimmed => write performance ?)

NUC8, yesterdays nightly, the above disk with ext4

===========================================

hdparm --direct -tT /dev/sdb

/dev/sdb:

Timing O_DIRECT cached reads: 586 MB in 2.00 seconds = 293.13 MB/sec

Timing O_DIRECT disk reads: 350 MB in 3.00 seconds = 116.54 MB/sec

LibreELEC:/var/media/BackupHD # dd if=/dev/zero of=tempfile bs=1MB count=10240

10240000000 bytes (9.5GB) copied, 78.312656 seconds, 124.7MB/s

LibreELEC:/var/media/BackupHD # dd if=tempfile of=/dev/zero bs=1MB count=10240

10240000000 bytes (9.5GB) copied, 103.853891 seconds, 94.0MB/s

===

man hdparm:

--direct

Use the kernel´s "O_DIRECT" flag when performing a -t timing test. This bypasses the page cache, causing the reads to go directly from the

drive into hdparm's buffers, using so-called "raw" I/O. In many cases, this can produce results that appear much faster than the usual page

cache method, giving a better indication of raw device and driver performance.

sudo hdparm --direct -tT /dev/sdX

might (I'm unsure) make the "echo 3 > /proc/sys/vm/drop_caches" superfluous ?