Posts by camelreef

    Right, now using ext4 as the file system on the Samba shared 5TB USB v3 WD Elements 2.5" external hard drive.

    I'm trying to hammer the system as much as I can using the planned workflow, so far, so good!

    [UPDATE] I've added multiple massive files writes and read in parallel through the network on top of the unusually high load workflow and everything is holding up like a champ and behaving as expected. I'm getting very realistic, solid and consistent read and write performance numbers.

    I'm still reserving declaration of a successful solution for later, as the occurrence of the drops was pretty random.

    It looks like the ntfs3 (Paragon) kernel driver gives real bad performance in my context. I'll try to look for when it replaced ntfs-3g (FUSE) as the default driver in LE, as it may explain why the other identical setup worked well for a good while. Ah! December 2021! This correlates!

    Replace ntfs-3g_ntfsprogs with Linux kernel 5.15 ntfs3 by heitbaum · Pull Request #5838 · LibreELEC/LibreELEC.tv
    DONE ✅ Dependant on #5738 RPi now updated #5888 AMlogic now updated #5920 Amlogic kernel update/image/package.mk Note: Amlogic is already configured and…
    github.com

    heitbaum, I don't know how high this will ever get into the LE crew's priorities, or if anything can be done about it, but ntfs3 may need attention.

    I'm losing a flexible functionality feature by going from NTFS to ext4, but everyone will survive, it's better than losing basic functionalities!

    Many thanks to all who have pitched in!

    quiet clear.

    but on one half data could get unmentioned lost/damaged...

    It's bulky data, but nothing that can't be replaced! That being said, I can;t remember the last time that cp failed copying anything other than perfectly.

    It will have taken 3 days, but I'm nearly done with the partitions swap from NTFS to ext4 and associated data juggling.

    Evaluating how that improves Samba will resume shortly!

    if you need to do it under Windows:

    https://fastcopy.jp/

    faster copy with verify => unattend copy

    Thanks! I will take a look for eventual future need. However, copying from NTFS to NTFS is only half the story, as I will be using ext4 partitions in the end.

    I'm using a dead classic cp -rv within a muxed terminal on a spare Linux box. It will take the time it needs to take while I live my life normally!

    ntfs at Linux is really slow, its not a "native" filesystem (just a fuse driver) at linux at the moment thats the reason the driver its rather slow

    so if you have a slow device combines with a slow fs you got very slow performance :)

    thats why you have a NAS :)

    ntfs3, from Paragon, as used currently in LE11 is not FUSE, like ntfs3g, and is supposed to provide very nice performance (it does on my Mac!) without resorting to user space.

    The system that is performing very poorly is using ntfs3, not the FUSE ntfs3g.

    I have another identical system (but different RPi 4 h/w rev.), so also using ntfs3, that is performing well.

    I have provided a reference benchmark on an Ubuntu running on a NUC7, using ntfs3f/FUSE that is performing well, at least adequately enough.

    Both RPis use the exact same external HDD brand/model/size, the NUC user an older and smaller model from the same brand.

    As for a NAS, I do have one at home, see the signature's link, because I'm a geek :) . That little system is for a friend who is very much not a geek, without a NAS....

    if it's an "WD_BLACK P10 Game Drive" (WDBA3A0050BBK-WESN):

    - have CMR and

    - support TRIM (=> untrimmed => write performance ?)

    It's a basic and cheap WD Elements spindle. The same in both systems.

    WD Elements Portable USB 3.0 External Hard Drive Storage (1 TB to 5 TB) | Western Digital | Western Digital
    WD Elements™ portable hard drives with USB 3.0 offer reliable, high-capacity storage to go, fast data transfer rates, universal connectivity and massive…
    www.westerndigital.com

    sudo hdparm --direct -tT /dev/sdX


    might (I'm unsure) make the "echo 3 > /proc/sys/vm/drop_caches" superfluous ?

    In the end, even if the test benefits from some caching, you would expect better numbers than what I am getting on that particular RPi 4 Rev. 1.5.

    Nico

    you mount a NTFS disk at linux and use it as native device? thats basically a NONO since a long time because the NTFS fuse driver is slow and not feature complete

    this may change with more recent kernels where finally a proper NTFS driver was merged

    can you try a ext4 formated disk ?

    I do not understand your comment.

    I use an NTFS formatted disk for practical reasons, as it is meant to be plugged regularly on Windows machines.

    Also, LE11 uses ntfs3, which is, I believe, the proper NTFS driver you mention.

    Finally, my reference system uses the old FUSE ntfs3g driver, and gets acceptable performance. Another system uses the same ntfs3 driver and gets acceptable performance

    I will try an ext4 formatted disk, but it will just be a data point, I need NTFS for my purpose (I also believe that NTFS disks usage is to be expected around LE).

    I will also try another NTFS formatted disk I have spare and compare perfs, as another data point.

    Oooooh.... Interesting turn!

    The Samba guys noticed that the disk subsystem was very slow and offered options to set.

    Right... It's fine adapting smbd to a slow disk... But is it really slow?

    So, let test the disk subsystem!

    LE11 RPi 4 system

    That is indeed quite poor! This is a new 5TB WD 2.5" USB 3 external HDD, one NTFS partition mounted using ntfs3, connected to one of the USB 3 sockets of the Pi 4.

    Just for perspective gathering, I've done a perf test on another Linux box of mine, with a much older 1 TB WD 2.5" USB 3 external HDD, one NTFS partition mounted using ntfs3g, so quite similar:

    Reference system

    So, that's read speed.

    Let's take a look at writing and reading on the actual file system:

    LE11 RPi 4 system

    Code
    kodiplayer:/var/media/content # sync && echo 3 > /proc/sys/vm/drop_caches && dd if=/dev/zero of=testfile bs=128k count=16k && ls -lah testfile && dd if=testfile of=/dev/null bs=128k
     && rm testfile
    16384+0 records in
    16384+0 records out
    2147483648 bytes (2.0GB) copied, 186.714454 seconds, 11.0MB/s
    -rw-r--r--    1 root     root        2.0G Feb 24 10:24 testfile
    16384+0 records in
    16384+0 records out
    2147483648 bytes (2.0GB) copied, 274.038229 seconds, 7.5MB/s

    Reference system

    Code
    root@rastaman:/mnt/Blue-1TB# sync && echo 3 > /proc/sys/vm/drop_caches && dd if=/dev/zero of=testfile bs=128k count=16k && ls -lah testfile && dd if=testfile of=/dev/null bs=128k && rm testfile
    16384+0 records in
    16384+0 records out
    2147483648 bytes (2.1 GB, 2.0 GiB) copied, 27.9964 s, 76.7 MB/s
    -rwxrwxrwx 1 nico root 2.0G Feb 24 10:21 testfile
    16384+0 records in
    16384+0 records out
    2147483648 bytes (2.1 GB, 2.0 GiB) copied, 42.7862 s, 50.2 MB/s

    Well... That's not glorious for the LE11 RPi 4!

    Remember that this is a H/W rev. 1.5 RPi 4 (running nightly-20220219-ae4b7da)

    . I have a carbon copy system in production elsewhere, but with H/W Rev. 1.2 (running LE11 nightly-20220120-f7f2fd5, I'm a bit scared of upgrading it, it works):

    Code
    LibreELEC:/var/media/Media # sync && echo 3 > /proc/sys/vm/drop_caches && dd if=/dev/zero of=testfile bs=128k count=16k && ls -lah testfile && dd if=testfile of=/dev/null bs=128k &&
     rm testfile
    16384+0 records in
    16384+0 records out
    2147483648 bytes (2.0GB) copied, 17.060671 seconds, 120.0MB/s
    -rw-r--r--    1 root     root        2.0G Feb 24 10:36 testfile
    16384+0 records in
    16384+0 records out
    2147483648 bytes (2.0GB) copied, 0.748839 seconds, 2.7GB/s

    A striking difference!

    I'm probably going to upgrade the production system to the same LE11 nightly build as the problematic system. I'm not too thrilled, but it will help knowing if it's a software issue...

    So, maybe not a Samba issue, but a disk I/O performance issue...

    if there bugsystem is broken just try the mailing list

    https://www.samba.org/samba/bugreports.html

    its sadly quite normal that linux stuff has no userfriendly way to report bugs etc

    sometimes mailinglists are the only way, welcome to 1990 :)

    See, your negativity shamed them enough to finally send me an email and allow me to open an account! ^^

    14988 – smbd pauses activity for 10s of seconds before resuming, silently

    Let's see what happens now

    Ah!

    On the client:

    Code
    Feb 14 21:46:10 grabber kernel: CIFS: Attempting to mount //10.25.25.2/content
    Feb 14 21:55:20 grabber kernel: CIFS: VFS: \\10.25.25.2 sends on sock 00000000d18678b6 stuck for 15 seconds
    Feb 14 21:55:20 grabber kernel: CIFS: VFS: \\10.25.25.2 Error -11 sending data on socket to server
    Feb 14 21:58:32 grabber kernel: CIFS: VFS: \\10.25.25.2 has not responded in 180 seconds. Reconnecting...
    Feb 14 21:58:58 grabber kernel: CIFS: VFS: \\10.25.25.2 sends on sock 000000009546efff stuck for 15 seconds
    Feb 14 21:58:58 grabber kernel: CIFS: VFS: \\10.25.25.2 Error -11 sending data on socket to server
    Feb 14 21:59:27 grabber kernel: CIFS: VFS: \\10.25.25.2 sends on sock 0000000022e8f049 stuck for 15 seconds
    Feb 14 21:59:27 grabber kernel: CIFS: VFS: \\10.25.25.2 Error -11 sending data on socket to server

    On the server:

    https://youplala.net/~nico/log.10.25.25.1

    The two machines agree on time (NTP).

    Again, it's as if smbd is taking a break without noticing...

    such bets are illegal (correct wording ?) cause only *you* know how big your disk is ! ;)

    /dev/mmcblk0p2 13.5G 2.1G 11.4G 16% /storage

    kodiplayer:~ # ls -lah log.10.25.25.1

    -rw-r--r-- 1 root root 1.9G Feb 14 16:23 log.10.25.25.1

    No drops... I've tried rebooting both machines a few time... Grrrrr.... Sad, considering that it was dropping constantly while I was setting up the client specific logging!

    [MANY HOURS LATER] I'm going to believe that logging fixes the problem at this rate! Still no drop, at, all... Solution: client specific log at level 10 redirected to /dev/null! Only joking....

    UK morning all!

    I have a bit of time to dedicate to this today.

    First, I don't believe that I've ever given the client's mount parameters:

    Code
    //10.25.25.2/content on /mnt/content type cifs (rw,relatime,vers=3.1.1,cache=strict,username=guest,uid=1000,forceuid,gid=1000,forcegid,addr=10.25.25.2,file_mode=0770,dir_mode=0770,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,x-systemd.automount)

    Who knows why there is an NFS mount option in there (actimeo)...

    The server's parameters are default LE, automatically sharing external discs.

    The gaps in logs were in the Samba logs with Level 10 on.

    Those Samba bugs are not really helpful, but I'm not the best person to judge.

    I'll try to coerce the LE samba to do some client-specific logging and come back here with data.

    [EDIT] It's on and working, giving very dense/verbose logs. How much you want to bet that the error won't crop up for hours, or at least before the /storage partition of my SD card is full?

    Oh, and a last data point, not the smallest: the similar production setup running an LE11 20220120 that I believed not subjected to the issue has displayed the same problem. This is depressing as it removes the possibility of a good known point...