LE11 Nightly - Samba server freezes, including logs - NTFS3 KERNEL DRIVER IS SLOW

  • if it's an "WD_BLACK P10 Game Drive" (WDBA3A0050BBK-WESN):

    - have CMR and

    - support TRIM (=> untrimmed => write performance ?)

    It's a basic and cheap WD Elements spindle. The same in both systems.

    WD Elements Portable USB 3.0 External Hard Drive Storage (1 TB to 5 TB) | Western Digital | Western Digital
    WD Elements™ portable hard drives with USB 3.0 offer reliable, high-capacity storage to go, fast data transfer rates, universal connectivity and massive…
    www.westerndigital.com

    sudo hdparm --direct -tT /dev/sdX


    might (I'm unsure) make the "echo 3 > /proc/sys/vm/drop_caches" superfluous ?

    In the end, even if the test benefits from some caching, you would expect better numbers than what I am getting on that particular RPi 4 Rev. 1.5.

    Nico

    Edited once, last by camelreef: Merged a post created by camelreef into this post. (February 24, 2022 at 5:28 PM).

  • I do not understand your comment.

    ntfs at Linux is really slow, its not a "native" filesystem (just a fuse driver) at linux at the moment thats the reason the driver its rather slow

    so if you have a slow device combines with a slow fs you got very slow performance :)

    I use an NTFS formatted disk for practical reasons, as it is meant to be plugged regularly on Windows machines.

    thats why you have a NAS :)

  • ntfs at Linux is really slow, its not a "native" filesystem (just a fuse driver) at linux at the moment thats the reason the driver its rather slow

    so if you have a slow device combines with a slow fs you got very slow performance :)

    thats why you have a NAS :)

    ntfs3, from Paragon, as used currently in LE11 is not FUSE, like ntfs3g, and is supposed to provide very nice performance (it does on my Mac!) without resorting to user space.

    The system that is performing very poorly is using ntfs3, not the FUSE ntfs3g.

    I have another identical system (but different RPi 4 h/w rev.), so also using ntfs3, that is performing well.

    I have provided a reference benchmark on an Ubuntu running on a NUC7, using ntfs3f/FUSE that is performing well, at least adequately enough.

    Both RPis use the exact same external HDD brand/model/size, the NUC user an older and smaller model from the same brand.

    As for a NAS, I do have one at home, see the signature's link, because I'm a geek :) . That little system is for a friend who is very much not a geek, without a NAS....

  • I haven't lost interest or fixed the issue. Normal life took over for a while.

    However, I am gathering enough storage to shift content about (only a few TB!) and switch partitions to ext4 from NTFS and see if I/O perfs improve.

    I'll report back!

  • if you need to do it under Windows:

    https://fastcopy.jp/

    faster copy with verify => unattend copy

    Thanks! I will take a look for eventual future need. However, copying from NTFS to NTFS is only half the story, as I will be using ext4 partitions in the end.

    I'm using a dead classic cp -rv within a muxed terminal on a spare Linux box. It will take the time it needs to take while I live my life normally!

  • quiet clear.

    but on one half data could get unmentioned lost/damaged...

    It's bulky data, but nothing that can't be replaced! That being said, I can;t remember the last time that cp failed copying anything other than perfectly.

    It will have taken 3 days, but I'm nearly done with the partitions swap from NTFS to ext4 and associated data juggling.

    Evaluating how that improves Samba will resume shortly!

  • Right, now using ext4 as the file system on the Samba shared 5TB USB v3 WD Elements 2.5" external hard drive.

    I'm trying to hammer the system as much as I can using the planned workflow, so far, so good!

    [UPDATE] I've added multiple massive files writes and read in parallel through the network on top of the unusually high load workflow and everything is holding up like a champ and behaving as expected. I'm getting very realistic, solid and consistent read and write performance numbers.

    I'm still reserving declaration of a successful solution for later, as the occurrence of the drops was pretty random.

    It looks like the ntfs3 (Paragon) kernel driver gives real bad performance in my context. I'll try to look for when it replaced ntfs-3g (FUSE) as the default driver in LE, as it may explain why the other identical setup worked well for a good while. Ah! December 2021! This correlates!

    Replace ntfs-3g_ntfsprogs with Linux kernel 5.15 ntfs3 by heitbaum · Pull Request #5838 · LibreELEC/LibreELEC.tv
    DONE ✅ Dependant on #5738 RPi now updated #5888 AMlogic now updated #5920 Amlogic kernel update/image/package.mk Note: Amlogic is already configured and…
    github.com

    heitbaum, I don't know how high this will ever get into the LE crew's priorities, or if anything can be done about it, but ntfs3 may need attention.

    I'm losing a flexible functionality feature by going from NTFS to ext4, but everyone will survive, it's better than losing basic functionalities!

    Many thanks to all who have pitched in!

    Edited 5 times, last by camelreef (March 4, 2022 at 8:50 AM).

  • isn't it already done by moving to kernel 5.15 ?

    https://www.phoronix.com/scan.php?page=…-For-Linux-5.15

    It has been done on Dec. 15. No need to go outside, I've already provided the link to the LE PR above.

    Clearly ntfs3 is not that great for me. It's either because of my context, or because there is an issue in its current implementation in LE, as ntfs3, a kernel driver, is supposed to provide better performance than the older ntfs-3g, a user space driver.

    Edited once, last by camelreef (March 4, 2022 at 8:29 AM).

  • there is an issue in its current implementation in LE, as ntfs3, a kernel driver

    that was my main point: it's an in-kernel driver

    maintained, etc by kernel driver developer, so I would guess LE maintainer do not have much influence of the implementation,etc. , they could just move packages, here the kernel, to a higher or the lastest (stable) version

  • Looking at the R&D we have now undertaken - I think the sentence could be written something like:

    using samba to serve files from kernel ntfs3 …… whereas using samba to serve files from ext4 ….. In earlier kernel versions …. using ntfs-3g, samba was able to serve files from the ntfs external drive … (without issue) …..

    I’m not sure that we have validated anything else yet? What does

    Code
    time dd if=file.mov of=/dev/null bs=4m 

    (or similar) show from a performance perspective? From each of the file system types?

  • Looking at the R&D we have now undertaken - I think the sentence could be written something like:

    using samba to serve files from kernel ntfs3 …… whereas using samba to serve files from ext4 ….. In earlier kernel versions …. using ntfs-3g, samba was able to serve files from the ntfs external drive … (without issue) …..

    I’m not sure that we have validated anything else yet? What does

    Code
    time dd if=file.mov of=/dev/null bs=4m 

    (or similar) show from a performance perspective? From each of the file system types?

    Your summary is a good one. However, I would say that the ability to serve files using Samba is only a consequence of the root cause, i.e. the difference of disk I/O performance provided by ext4, ntfs3 and ntfs-3g mounted filesystems, at least in my context.

    The clear highlight is that I've first experienced the issue while staging a kit in January 2022, when I had a copycat production system having no issue, as it was in a software state predating December 15 2021, when the kernel update including the switch from ntfs-3g to ntfs3 arrived. Updating the software to a version including that switch also put the production system in a situation where the issue manifested itself.


    On to some benchmarking data...

    This is on the LE machine, no Samba involved, cache dropped, repeatable numbers:

    WD Element 2.5" external spindle, USB 3.0, 5 TB, NTFS formatted partition, ntfs3 kernel driver, RPi 4, LE 11 nightly-20220303-70c3c5d

    Code
    kodiplayer:/var/media/content # sync && echo 3 > /proc/sys/vm/drop_caches && dd if=/dev/zero of=testfile bs=128k count=16k && sync && echo 3 > /proc/sys/vm/drop_caches && dd if=testfile of=/dev/null bs=128k && rm testfile
    16384+0 records in
    16384+0 records out
    2147483648 bytes (2.0GB) copied, 186.714454 seconds, 11.0MB/s
    16384+0 records in
    16384+0 records out
    2147483648 bytes (2.0GB) copied, 274.038229 seconds, 7.5MB/s

    WD Element 2.5" external spindle, USB 3.0, 5 TB, ext4 formatted partition, RPi 4, LE 11 nightly-20220303-70c3c5d

    Code
    kodiplayer:/var/media/content # sync && echo 3 > /proc/sys/vm/drop_caches && dd if=/dev/zero of=testfile bs=128k count=32k && sync && echo 3 > /proc/sys/vm/drop_caches && dd if=testfile of=/dev/null bs=128k && rm testfile
    32768+0 records in
    32768+0 records out
    4294967296 bytes (4.0GB) copied, 40.273163 seconds, 101.7MB/s
    32768+0 records in
    32768+0 records out
    4294967296 bytes (4.0GB) copied, 42.869481 seconds, 95.5MB/s

    I should have done some tests over Samba before, using the NTFS filesystem, but it stupidly did not occur to me, sorry. However, it would most probably have embarrassed Samba, which would have defaulted as explained by the samba guys, and no meaningful data would have been gathered.

    If the aio write size setting does not help, the next step will be "sync always = yes" and "strict sync = yes". We need to avoid that the kernel accumulates unwritten data in RAM, which can lead to these large timeouts that Samba has no control over. If you do the "sync always = yes", my hope is that the disk slowness is smoothed out over the time of all writes and is not batched when the kernel decides that it's enough to flush everything to the very slow disk.

    I ended up ignoring the Samba options, preferring to fix the root cause, bad disk I/O, instead of adapting Samba to it.

    Here is a benchmark over Samba, cache dropped, repeatable numbers. This is the current working solution, all is working well, disk I/O has enough performance, Samba is happy, the services higher up are happy:

    RPi 3B+, Samba mount with default system options, up-to-date DietPi <-> WD Element 2.5" external spindle, USB 3.0, 5 TB, ext4 formatted partition, Samba share with default system options, RPi 4, LE 11 nightly-20220303-70c3c5d

    Code
    root@grabber:/mnt/content# sync && echo 3 > /proc/sys/vm/drop_caches && dd if=/dev/zero of=testfile bs=128k count=16k && sync && echo 3 > /proc/sys/vm/drop_caches && dd if=testfile of=/dev/null bs=128k && rm testfile
    16384+0 records in
    16384+0 records out
    2147483648 bytes (2.1 GB, 2.0 GiB) copied, 64.9038 s, 33.1 MB/s
    16384+0 records in
    16384+0 records out
    2147483648 bytes (2.1 GB, 2.0 GiB) copied, 74.991 s, 28.6 MB/s

    Does this help?

    Edited 2 times, last by camelreef (March 5, 2022 at 8:56 AM).

  • btw dump write performance is just the half truth, read/write a lot files is also horrible slower compared to ext4

    I'm ready to accept that.

    The use case is a few large files being constantly written to with small chunks of data. That must not be any better!

    In any case, if the dump benchmark is bad, the real world case of multiple read/writes will be worse.

    Edited once, last by camelreef (March 5, 2022 at 10:07 AM).