not sure what commands you actually use to mount and which are there due defaults, I would try it with a bare minimum
vers=3.1.1 -> vers=3.0
cache=strict
serverino
mapposix
not sure what commands you actually use to mount and which are there due defaults, I would try it with a bare minimum
vers=3.1.1 -> vers=3.0
cache=strict
serverino
mapposix
not sure what commands you actually use to mount and which are there due defaults, I would try it with a bare minimum
vers=3.1.1 -> vers=3.0
cache=strict
serverino
mapposix
I have previously tried downgrading vers progressively down to 2.0, with identical results.
I can try with th other options removed.
BTW, the mount options are defaults from DietPi.
and which are there due defaults
from my Fedora 35 box:
testparm -s => what is set in *my* smb.conf
testparm -vs => what is set as default plus what is set in *my* smbconf
How much you want to bet that the error won't crop up for hours, or at least before the /storage partition of my SD card is full?
such bets are illegal (correct wording ?) cause only *you* know how big your disk is !
such bets are illegal (correct wording ?) cause only *you* know how big your disk is !
/dev/mmcblk0p2 13.5G 2.1G 11.4G 16% /storage
kodiplayer:~ # ls -lah log.10.25.25.1
-rw-r--r-- 1 root root 1.9G Feb 14 16:23 log.10.25.25.1
No drops... I've tried rebooting both machines a few time... Grrrrr.... Sad, considering that it was dropping constantly while I was setting up the client specific logging!
[MANY HOURS LATER] I'm going to believe that logging fixes the problem at this rate! Still no drop, at, all... Solution: client specific log at level 10 redirected to /dev/null! Only joking....
Ah!
On the client:
Feb 14 21:46:10 grabber kernel: CIFS: Attempting to mount //10.25.25.2/content
Feb 14 21:55:20 grabber kernel: CIFS: VFS: \\10.25.25.2 sends on sock 00000000d18678b6 stuck for 15 seconds
Feb 14 21:55:20 grabber kernel: CIFS: VFS: \\10.25.25.2 Error -11 sending data on socket to server
Feb 14 21:58:32 grabber kernel: CIFS: VFS: \\10.25.25.2 has not responded in 180 seconds. Reconnecting...
Feb 14 21:58:58 grabber kernel: CIFS: VFS: \\10.25.25.2 sends on sock 000000009546efff stuck for 15 seconds
Feb 14 21:58:58 grabber kernel: CIFS: VFS: \\10.25.25.2 Error -11 sending data on socket to server
Feb 14 21:59:27 grabber kernel: CIFS: VFS: \\10.25.25.2 sends on sock 0000000022e8f049 stuck for 15 seconds
Feb 14 21:59:27 grabber kernel: CIFS: VFS: \\10.25.25.2 Error -11 sending data on socket to server
On the server:
https://youplala.net/~nico/log.10.25.25.1
The two machines agree on time (NTP).
Again, it's as if smbd is taking a break without noticing...
So, guys, any indication about generating some useful data about smbd? Some external monitoring, as getting anything internally appears to be hard...
I'll take the previous logs to the Samba guys in a bug report, see what they say. I'll add the link here when done.
heitbaum is preparing nightly to run last samba
heitbaum is preparing nightly to run last samba
This is giving me hope! Thanks for the tip!
is the link to the samba bug report available now ?
is the link to the samba bug report available now ?
I wish!
I've sent the email explaining why I'd like a bugzilla account, and never heard back, twice... A bit of a weird process...
Can you believe that I have not yet been offered a Samba bugzilla account or got any feedback from them after 3 attempts?
Also, the Samba 4.16 inclusion in LE11 must have taken a back seat in favour of more pressing matters.
Can you believe that I have not yet been offered a Samba bugzilla account or got any feedback from them after 3 attempts?
if there bugsystem is broken just try the mailing list
its sadly quite normal that linux stuff has no userfriendly way to report bugs etc
sometimes mailinglists are the only way, welcome to 1990
if there bugsystem is broken just try the mailing list
https://www.samba.org/samba/bugreports.html
its sadly quite normal that linux stuff has no userfriendly way to report bugs etc
sometimes mailinglists are the only way, welcome to 1990
See, your negativity shamed them enough to finally send me an email and allow me to open an account!
14988 – smbd pauses activity for 10s of seconds before resuming, silently
Let's see what happens now
Can you believe that I have not yet been offered a Samba bugzilla account or got any feedback from them after 3 attempts?
Also, the Samba 4.16 inclusion in LE11 must have taken a back seat in favour of more pressing matters.
on my work list … not forgotten but kernel 5.16.11 and gcc / glib and binutils trumped it. ETA on 4.16 in nightlies would be mid/late march at this stage. Waiting till 4.13 goes EOL and I push the final 4.13 into LE11/LE10 - then can drop 4.16 in. There is still compile error (probably I need to update the patch - but only did the basics on the PR to get it unpacking and compiling as my WIP)
Oooooh.... Interesting turn!
The Samba guys noticed that the disk subsystem was very slow and offered options to set.
Right... It's fine adapting smbd to a slow disk... But is it really slow?
So, let test the disk subsystem!
LE11 RPi 4 system
/dev/sda2 on /var/media/content type ntfs3 (rw,relatime,uid=0,gid=0,fmask=37777600133,iocharset=utf8)
kodiplayer:~ # hdparm -Ttv /dev/sda2
/dev/sda2:
multcount = 0 (off)
readonly = 0 (off)
readahead = 256 (on)
geometry = 4769073/64/32, sectors = 9767061504, start = 411648
Timing cached reads: 2030 MB in 2.00 seconds = 1016.80 MB/sec
Timing buffered disk reads: 162 MB in 3.01 seconds = 53.83 MB/sec
Display More
That is indeed quite poor! This is a new 5TB WD 2.5" USB 3 external HDD, one NTFS partition mounted using ntfs3, connected to one of the USB 3 sockets of the Pi 4.
Just for perspective gathering, I've done a perf test on another Linux box of mine, with a much older 1 TB WD 2.5" USB 3 external HDD, one NTFS partition mounted using ntfs3g, so quite similar:
Reference system
/dev/sdb1 on /mnt/Blue-1TB type fuseblk (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,blksize=4096)
13 nico@rastaman:~$ sudo hdparm -Ttv /dev/sdb1
[sudo] password for nico:
/dev/sdb1:
multcount = 0 (off)
readonly = 0 (off)
readahead = 256 (on)
geometry = 121597/255/63, sectors = 1953458174, start = 2
Timing cached reads: 22072 MB in 1.99 seconds = 11092.93 MB/sec
Timing buffered disk reads: 324 MB in 3.01 seconds = 107.58 MB/sec
Display More
So, that's read speed.
Let's take a look at writing and reading on the actual file system:
LE11 RPi 4 system
kodiplayer:/var/media/content # sync && echo 3 > /proc/sys/vm/drop_caches && dd if=/dev/zero of=testfile bs=128k count=16k && ls -lah testfile && dd if=testfile of=/dev/null bs=128k
&& rm testfile
16384+0 records in
16384+0 records out
2147483648 bytes (2.0GB) copied, 186.714454 seconds, 11.0MB/s
-rw-r--r-- 1 root root 2.0G Feb 24 10:24 testfile
16384+0 records in
16384+0 records out
2147483648 bytes (2.0GB) copied, 274.038229 seconds, 7.5MB/s
Reference system
root@rastaman:/mnt/Blue-1TB# sync && echo 3 > /proc/sys/vm/drop_caches && dd if=/dev/zero of=testfile bs=128k count=16k && ls -lah testfile && dd if=testfile of=/dev/null bs=128k && rm testfile
16384+0 records in
16384+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 27.9964 s, 76.7 MB/s
-rwxrwxrwx 1 nico root 2.0G Feb 24 10:21 testfile
16384+0 records in
16384+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 42.7862 s, 50.2 MB/s
Well... That's not glorious for the LE11 RPi 4!
Remember that this is a H/W rev. 1.5 RPi 4 (running nightly-20220219-ae4b7da)
. I have a carbon copy system in production elsewhere, but with H/W Rev. 1.2 (running LE11 nightly-20220120-f7f2fd5, I'm a bit scared of upgrading it, it works):
LibreELEC:/var/media/Media # sync && echo 3 > /proc/sys/vm/drop_caches && dd if=/dev/zero of=testfile bs=128k count=16k && ls -lah testfile && dd if=testfile of=/dev/null bs=128k &&
rm testfile
16384+0 records in
16384+0 records out
2147483648 bytes (2.0GB) copied, 17.060671 seconds, 120.0MB/s
-rw-r--r-- 1 root root 2.0G Feb 24 10:36 testfile
16384+0 records in
16384+0 records out
2147483648 bytes (2.0GB) copied, 0.748839 seconds, 2.7GB/s
A striking difference!
I'm probably going to upgrade the production system to the same LE11 nightly build as the problematic system. I'm not too thrilled, but it will help knowing if it's a software issue...
So, maybe not a Samba issue, but a disk I/O performance issue...
you mount a NTFS disk at linux and use it as native device? thats basically a NONO since a long time because the NTFS fuse driver is slow and not feature complete
this may change with more recent kernels where finally a proper NTFS driver was merged
can you try a ext4 formated disk ?
you mount a NTFS disk at linux and use it as native device? thats basically a NONO since a long time because the NTFS fuse driver is slow and not feature complete
this may change with more recent kernels where finally a proper NTFS driver was merged
can you try a ext4 formated disk ?
I do not understand your comment.
I use an NTFS formatted disk for practical reasons, as it is meant to be plugged regularly on Windows machines.
Also, LE11 uses ntfs3, which is, I believe, the proper NTFS driver you mention.
Finally, my reference system uses the old FUSE ntfs3g driver, and gets acceptable performance. Another system uses the same ntfs3 driver and gets acceptable performance
I will try an ext4 formatted disk, but it will just be a data point, I need NTFS for my purpose (I also believe that NTFS disks usage is to be expected around LE).
I will also try another NTFS formatted disk I have spare and compare perfs, as another data point.
5TB WD 2.5" USB 3 external HDD
if it's an "WD_BLACK P10 Game Drive" (WDBA3A0050BBK-WESN):
- have CMR and
- support TRIM (=> untrimmed => write performance ?)
NUC8, yesterdays nightly, the above disk with ext4
===========================================
hdparm --direct -tT /dev/sdb
/dev/sdb:
Timing O_DIRECT cached reads: 586 MB in 2.00 seconds = 293.13 MB/sec
Timing O_DIRECT disk reads: 350 MB in 3.00 seconds = 116.54 MB/sec
LibreELEC:/var/media/BackupHD # dd if=/dev/zero of=tempfile bs=1MB count=10240
10240000000 bytes (9.5GB) copied, 78.312656 seconds, 124.7MB/s
LibreELEC:/var/media/BackupHD # dd if=tempfile of=/dev/zero bs=1MB count=10240
10240000000 bytes (9.5GB) copied, 103.853891 seconds, 94.0MB/s
===
man hdparm:
--direct
Use the kernel´s "O_DIRECT" flag when performing a -t timing test. This bypasses the page cache, causing the reads to go directly from the
drive into hdparm's buffers, using so-called "raw" I/O. In many cases, this can produce results that appear much faster than the usual page
cache method, giving a better indication of raw device and driver performance.
sudo hdparm --direct -tT /dev/sdX
might (I'm unsure) make the "echo 3 > /proc/sys/vm/drop_caches" superfluous ?