Posts by Kernle32DLL

    It's not a great surprise that crypto functions run better as 64-bit code on a 64-bit processor but you may hit two challenges running aarch64 LE images. The first is there are no aarch64 binary addons in our repo; because there are no official LE 9.0 aarch64 images. The second is addons requiring libwidevine (e.g. Netflix/Amazon) will not work as there are no 64-bit 'arm' versions of the lib in circulation to use. Being able to access DRM protected services is a hallmark feature/capability for Kodi 18 so this is the primary reason LE moved from aarch64 in LE 8.x releases to a split 64-bit kernel and 32-bit userspace arrangement (the same as Android) for LE 9.0

    Darn, I totally forgot that. Makes sense. So no aarch64 then.

    But yeah, I will move the rock64 specific crypto discussion (e.g. rk_crypto) somewhere else, as my intent for creating this thread was - for the most part - to understand what cryptodev does in this particular LL setup (is that an ARM thing? Rockchip specific? Or else?).

    Hi everyone,

    first, a bit of preface so you can understand why I do what I do. You can skip to after the first horizontal line, if you aren't interested in that.

    For many months, I was running a RaPi 3. It was running fine for the most part, but some media files would very occasionally stutter. I checked everything, from limitations of my server (HDD, etc.), to bad network connections - all for nothing. Iperf told me that the raw network connection between me Pi and the server was absolutely fine, too. So what gives? I was using SFTP for the connection. I changed it to NFS - and all the problems were gone. After some more digging around I found the culprit: When streaming the media files (or doing a scp command on the Pi for that matter), one of the RaPi cores went to 100% - while the others kept idling. So it was obvious, that the RaPi could not deliver the CPU performance to handle the crypto overhead. Right?

    Wrong - partially. Introducing the Rock64. I got that device a few months as a replacement for my RaPi. At this point a big shout out to everyone involved in getting the RK3328 to work so great with LibreElec! It works great as a daily driver with Kwiboos changes. However, the stuttering with SFTP was still there - bummer. Same behavior as on The RaPi, with one core maxed out, while the others idle around.

    After hours searching on the net and not finding anything useful, I was ready to give up. But then I found two interesting things. First, I stumbled across the HPN-SSH project. I had read about it before, and I thought it would be good way to get my feet wet with the LL build system. In short - I got it to work with the kitchen sink patch, and was able to try it out. I checked all 4 combinations of HPN server and client. As you can see with the following table, there were some speed improvements, but nothing big.

    Quick preamble on how the tests were conducted:

    Both servers were identical Debian 9 VMs with OpenSSH 7.4p1 build from source, one with and one without the HPN kitchen sink patch applied. Data rate was read from a scp call on the Rock64 to /dev/null. All HPN related tests were conducted with commit 3e3cc3d of the original LL repo (not Kwiboos Branch!)

    Cipher HPN server
    HPN client
    HPN server
    Regular client

    Regular server

    HPN client

    Regular server
    Regular client
    [email protected] 29.5MB/s 30.2MB/s 33.9MB/s 34.1MB/s
    aes128-ctr 48.6MB/s 53.5MB/s 60.7MB/s 55.8MB/s
    [email protected] 64.5MB/s 60.2MB/s 65.5MB/s 62.2MB/s
    aes128-cbc 74.4MB/s 64.7MB/s 76.3MB/s 72.4MB/s
    aes192-cbc 73.0MB/s 60.1MB/s 70.3MB/s 60.4MB/s
    aes256-cbc 56.5MB/s 57.3MB/s 65.3MB/s 54.5MB/s

    The numbers are a bit all over the place. But two things I deduced from this: First, for whatever reason, the HPN server seems a bit slower overall. Second - funnily enough, the fastest configuration is the HPN patched client with a non-HPN server. Whats off is the the HPN server + client configuration, where some ciphers are faster, and some are slower than their HPN server + regular client counter part.

    These numbers underline my original problem very good by the way - My favorite media file for testing (which triggers the stuttering right at the opening logo), peaks at a data rate of around 60MB/s - hence the stuttering. From the numbers, it seems the sftp connection is negotiated with chacha20-poly1305, as the other ciphers "should" be fast enough.

    So, after the HPN tests pretty much were fruitless, I was looking for other options to get the load off from that single core. And this is how I read about Cryptodev on some forum (can't remember where). From what I could piece together from various forum posts, the architecture of both the RaPi 3 and Rock64 (ARMv8) should support some kind of crypto operations, which should make it all much faster - provided you can access it. So I set out to integrate cryptodev into LL. I got it kinda working after two days, to... mixed results. Hence this thread.

    First, I was able to verify that indeed both the cryptodev kernel module was loaded, and openssl was build correctly with cryptodev support:

    1. $ modinfo cryptodev
    2. filename: /lib/modules/4.4.114/cryptodev-linux/cryptodev.ko
    3. license: GPL
    4. description: CryptoDev driver
    5. author: Nikos Mavrogiannopoulos <[email protected]>
    6. depends:
    7. vermagic: 4.4.114 SMP mod_unload aarch64
    8. parm: cryptodev_verbosity:0: normal, 1: verbose, 2: debug (int)
    1. $ openssl engine -t
    2. (cryptodev) BSD cryptodev engine
    3. [ available ]
    4. (dynamic) Dynamic engine loading support
    5. [ unavailable ]

    Next, I did some openssl speed tests. One exemplary call to get you an idea - the other numbers are plunged in the following table for an easier overview.

    cryptodev enabled type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    no aes-128-cbc 137789.31k 388067.80k 698093.23k 892886.70k 971374.59k
    yes aes-128-cbc 56166.71k
    no sha1 15756.34k 57921.41k 179581.53k 377012.91k 557703.17k
    yes sha1 4224.80k

    So, what do these numbers tell me? I was already aware, that smaller block sizes tend to be slower with cryptodev. Fair enough - bigger block sizes see dramatic improvements though. So, any changes in the openssh speeds? Yes - data rates where actually slower. But not much, so it might be off to margin of error.

    >> The big question now - what is happening here? Integrating cryptodev certainly did something, but not what I expected or hoped. <<

    So, all tests so far have been done with the LL master at 3e3cc3d. After some more investigation, I figured that maybe something else had to be enabled on the kernel side of things. This is where Kwiboos fork came into play. Primarily, as it used an updated Rockchip kernel, which (among other things) exposed a new CONFIG_CRYPTO_DEV_ROCKCHIP configuration. After some google investigation it seemed that the RK3328 includes its own crypto engine, which might just not be accessible with my previous setup yet (falling back to the ARMv8 internal thingy?). So, compile time again....

    So, first things first. I build the current Kwiboo part-6 branch, an integrated cryptodev. Openssl speed tests - pretty much no difference. I also verified that rk_crypto was not loaded yet. So, next ; build with CONFIG_CRYPTO_DEV_ROCKCHIP enabled. However, this did nothing. rk_crypto was not loaded, and subsequently no change in openssl performance. I am still looking into why that is, but currently I'm out of ideas.

    So, this is the second topic to discuss - Just what am I seeing here? Does cryptodev not pick up on rk_crypto? Or does it maybe not even support that? I'm totally out of my league here :-/

    So, massive text wall. I would be glad for any input to sort this out, and interpret the numbers I am seeing here.

    EDIT 1:

    Well, I found out something neat while looking at the rockchip kernel config. The kernel can be compiled as aarch64 for supported cpus - such as the aforementioned RK3328 I'm using. So - did a recompile of LL with ARCH=aarch64 (instead of ARCH=arm, as mentioned in the readme), and bam, performance increased significantly. Openssl speed test saw performance improvements both with and without cryptodev. Transfer speeds for sftp also increased (for the most part - have yet to do a proper test sweep - So no numbers yet).

    I will do more tests tomorrow, and see if I can get some solid numbers. Also, I don't know if this radical performance change is just because of the aarch64 mode, or because of the additional "crypto" cpu flag in LL for RK3328 with aarch64. I will investigate.

    kostamanI have uploaded test .img.gz/.tar files for rock64 and box-trn9 at Index of /test/ built from my latest rockchip-part6 branch.

    It includes RendererDRMPRIME: release video buffers after flush by Kwiboo · Pull Request #13978 · xbmc/xbmc · GitHub that should make playback a little bit more reliable.

    Please report new issues seen on these images compared to latest nightly.

    Nice! Everything (still) working nicely. I will test this build a bit more, but I don't presume I will find more errors. My personal highlight is still the 10bit thingy running.

    Something else that caught my eye tho: While digging trough my media library for things to test, I actually found a (1080p) VC-1 movie file. It runs in SW, and rather bad. The video playback to slow, and not in sync with audio (just like 4k BBB). Two things to note: While the RK3328 should be able to decode VC-1 in hardware - I presume LE integration will not be feasible soon? Nevertheless, I would have been under the impression that the RK3328 would be able to decode VC-1 in software sufficiently. Maybe this is worth taking a look at (I think I read somewhere that the RaPi 3 can decode VC-1 in software - hence the thought).

    I tried Kwiboo s branch "rockchip-part6" at d404a14.

    From what I can tell, this fixes the "black-after-stop" issue (as seen by kostaman in 330aad4).

    More importantly this also makes "Anime A" work for me. That is, the x264 10bit one. And cherry on top - no color artifacts like I got on my Raspberry Pi 3 \o/.

    Also, I did not encounter any video stuttering (or other problems) on resumed playback. I'm not 100% sure yet if its actually fixed tho, as I was unable to reproduce it reliably anyway.

    Thanks a lot for your work Kwiboo ! :)

    EDIT: Also, I noticed that the errors which currently occur on 330aad4 (see attachment) when doing the initial startup (with partition resizing, etc.) are gone with the mentioned branch. Nice!

    The VPU in RK3328 do not seem to handle bbb h264 2160p 60fps but should handle the h264 2160p 30fps version, guess it is a hw limit (I have not tested any other 4k h264 60fps video).

    Hmmm, thats strange. Considering that the Pine64 webpage advertises the Rock64 (with RK3328) as "64-bit 4K60P HDR Media Board Computer". Well it does not mention with what encoding, tho :D

    You are correct, there was a missing depend from kodi to gbm-rockchip, please try the latest from rockchip-part6 again, it should have the correct depends now

    Will do

    So, as I finally got my hands on a Rock64 4gb Friday, I thought I should weigh in here with tome tests, too (as the Rock64 is a RK3328 device, too).

    I conducted tests with the 20180601 nightly build, as well as the "part-7" branch from Kwiboo. I also tried to compile the "part-6" branch, but that failed

    to compile (something about missing GDM - I presume it has something to do with this).

    EDIT: I noticed while writing this that Kwiboo rebased "part-6", I will try to compile this again now. I will also compile the current master, since this includes some neat changes.

    I did test the following files, all 1080p if not noted otherwise:

    BigBuckBunny 60fps, both 1080p and 4k

    Anime A:

    - FFProbe: h264 (High 10), yuv420p10le(progressive), 1440x1080 [SAR 1:1 DAR 4:3], 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default)

    - VLC: H264 - MPEG-4 AVC (part 10) (avc1) / Planar 4:2:0 YUV 10-bit LE

    Anime B:

    - FFProbe: hevc (Main), yuv420p(tv, unknown/bt709/unknown), 1920x1080 [SAR 1:1 DAR 16:9], 23.98 fps, 23.98 tbr, 1k tbn, 23.98 tbc (default)

    - VLC: MPEG-H Part2/HEVC (H.265) (hevc) / Planar 4:2:0 YUV

    Series episode:

    - FFProbe: h264 (High), yuv420p(progressive), 1920x1080, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default)

    - VLC: H264 - MPEG-4 AVC (part 10) (avc1)


    - FFProbe: h264 (High), yuv420p(tv, bt709/unknown/unknown, progressive), 1920x804, SAR 1:1 DAR 160:67, 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default)

    - VLC: H264 - MPEG-4 AVC (part 10) (avc1)


    - BigBuckBunny 1080p: Works flawlessly

    - BigBuckBunny 4k: Video plays "to slow", and is out of sync with audio

    - Anime A: Unwatchable because of video stutter

    - Anime B: Works flawlessly

    - Series episode: Works flawlessly

    - Movie: Works flawlessly

    Kwiboo "rockchip part 7" (a9bed38daf236ecd502357890ae2bc47a55cf9c4)

    - BigBuckBunny 1080p: Works flawlessly

    - BigBuckBunny 4k: Video plays "to slow", and is out of sync with audio

    - Anime A: Unwatchable because of video stutter

    - Anime B: Works flawlessly

    - Series episode: Works flawlessly

    - Movie: Works flawlessly

    Other things to note which are of interest:

    - Anime A did run ok on my Rasperry Pi 3 with LE 8 and 9, but had very annoying color artifacts (probably because of 10bit?)

    - I did experience some problems across the builds with resumed playback - stuttering and outright stuck playback. But I was unable to reproduce this yet :|

    - I have no idea why the movie works, and the series episode doesn't (on 20180601) - for all I know they are encoded the same way. (see point above)

    - Neither 1080p nor 4k BigBuckBunny ran on the Rasperry Pi 3 (1080p with stutter, 4k did not even run)

    - The movie file works fine via NFS or USB, but stutters via SFTP. This happens on the Raspberry Pi, too - I have yet to figure out why (network is fast enough, and my laptop doesn't have the problem) (if found the reason - SFTP transfer speed is limited to 30mbit for me. I will investigate). All I know is that the bit rate peeks at 60mbit at the opening logo - which is when the stuttering happens.