[SOLVED] LE8 OpenVPN throughput is only half of Armbian

  • I am seeing quite a noticeable performance variation between LE and Armbian on the same Amlogic S905 box for encryption and decryption use in OpenVPN, which should be relevant to all of us running Kodi with a VPN connection, using Zomboided, for instance.

    LibreElec seems to deliver roughly half the throughput when compared to an Armbian install on the same hardware (NexBox A95X with a S905 AmLogic chipset - the non-X version).

    Here are the stats:

    LibreElec 8.0.1h-temp_sensor_disabled

    Kodi:~ # openssl speed -evp aes-128-cbc
    Doing aes-128-cbc for 3s on 16 size blocks: 4558454 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 1290974 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 334686 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 84452 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 10585 aes-128-cbc's in 3.00s
    LibreSSL 2.4.4
    built on: date not available
    options:bn(64,32) rc4(ptr,int) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(idx)
    compiler: information not available
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    aes-128-cbc 24311.75k 27540.78k 28559.87k 28826.28k 28904.11

    Average throughput is ~27628 for LibreElec

    Balbes ARMBIAN 5.27 stable Ubuntu 16.04.2 LTS 3.14.29 (headless server build):

    kodi@amlogic-s905x:~$ openssl speed -evp aes-128-cbc
    Doing aes-128-cbc for 3s on 16 size blocks: 8416879 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 2436154 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 637152 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 161162 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 20208 aes-128-cbc's in 3.00s
    OpenSSL 1.0.2g 1 Mar 2016
    built on: reproducible build, date unspecified
    options:bn(64,64) rc4(ptr,char) des(idx,cisc,16,int) aes(partial) blowfish(ptr)
    compiler: cc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    aes-128-cbc 44890.02k 51971.29k 54370.30k 55009.96k 55181.31k

    Average throughput is ~52284 for Armbian.

    Therefore it seems that LibreElec is only offering ~53% of the OpenVPN throughput performance compared to the same hardware running Armbian. Why is that?

    And can this be used in our favor / utilized in future LE8 versions for the S905/X Amlogic boxes? In other words; are there any low-hanging fruits in terms of possible optimizations for encryption and decryption speed on LE8 in the future for the AmLogic boxes we use and grind on a daily basis?

    Edited once, last by fxfxfx (April 24, 2017 at 11:09 AM).

  • Lower performance is because LE using LibreSSL instead of OpenSSL. LibreSSL is slower as it doesn't include any target-specific assembly code.

    Things will change in LE9 as it switches back to OpenSSL.

  • Just a quick update for those with special interest in OpenVPN performance on the Amlogic boxes. I tried installing the May 31st Debian Jessie release from Balbes and got even more impressive results on my Nexbox A95X S905 (non-X):

    frank@amlogic:~$ openssl speed -evp aes-128-cbc
    Doing aes-128-cbc for 3s on 16 size blocks: 9027426 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 2822472 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 757706 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 192997 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 24256 aes-128-cbc's in 2.99s
    OpenSSL 1.0.1t 3 May 2016
    built on: Fri Jan 27 00:08:40 2017
    options:bn(64,64) rc4(ptr,char) des(idx,cisc,16,int) aes(partial) blowfish(ptr)
    compiler: gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    aes-128-cbc 48146.27k 60212.74k 64657.58k 65876.31k 66456.57k

    Average throughput is ~61070 for Armbian (Jessie).

    Only puzzling thing is the apparent version downgrade of OpenSSL from 1.0.2g dated March 1st 2016 to OpenSSL version 1.0.1t, which strangely has a newer date of May 3rd 2016, to make it even more confusing. Anyways, throughput performance is up ~17%.

  • S905X has crypto extensions which makes AES even faster:

  • S905X has crypto extensions which makes AES even faster:

    Code
    LibreELEC:~ # openssl speed -evp aes-128-cbc
    ...
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128-cbc     143550.18k   412706.35k   756339.84k   972738.90k  1066242.23k

    Thanks. I have access to three S905X boxes, but never tested them as OpenVPN servers, as I 'knew' that the CPU of the S905 was somewhat more powerful than the S905X, and assumed S905 superiority in OpenVPN server use because of it. But having added crypto extensions in the S905X changes the game in my specific use case scenario.

    I will have to replace my current OpenVPN server (S905 box) with a crypographically beefier S905X replacement.

    Very useful info, indeed. Thanks again.

  • I have failed to replicate your performance results for S905X boxes running LE 8.0.2a, since that version is still running LibreSSL 2.4.4. Are you running a dev build/LE9 version/modded LE8? I see that you have OpenSSL 1.0.2k, rather than LibreSSL - intriguing, as I thought that was LE9 only :)

    My LibreSSL test on a S905X 2G/8G box running LE 8.0.2a:

    Kodi:~ # openssl speed -evp aes-128-cbc
    Doing aes-128-cbc for 3s on 16 size blocks: 4552103 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 1283597 aes-128-cbc's in 2.98s
    Doing aes-128-cbc for 3s on 256 size blocks: 333310 aes-128-cbc's in 2.99s
    Doing aes-128-cbc for 3s on 1024 size blocks: 84084 aes-128-cbc's in 2.99s
    Doing aes-128-cbc for 3s on 8192 size blocks: 10522 aes-128-cbc's in 2.98s
    LibreSSL 2.4.4
    built on: date not available
    options:bn(64,32) rc4(ptr,int) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(idx)
    compiler: information not available


    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    aes-128-cbc 24277.88k 27567.18k 28537.58k 28796.66k 28924.91k

  • Tried and tested on S905 and S905X. So far, I cannot achieve the ~100.000k+ performance seen above on a S905X. The S905 and S905X achieve identical performance on my boxes, around 66.000+ max. But a definite improvement in performance compared to LibreSSL and now comparable to Debian on S905/S905X.

    Ok, so an score update for the interested, based on LE 8.0.2b / OpenSSL:

    S905X (2G/16G)

    LibreELEC (community) Version: 8.0.2b-temp_sensor_disabled
    LibreELEC git: 41cba91834574a1e3388dcfca8d3444c78d61287

    S905X:~ # openssl speed -evp aes-128-cbc
    OpenSSL 1.0.2l 25 May 2017

    options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
    compiler: /home/kszaq/ocz/le-master/build.LibreELEC-S905.arm-8.0-devel/toolchain/bin/armv8a-libreelec-linux-gnueabi-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -march=armv8-a+crc -mabi=aapcs-linux -Wno-psabi -Wa,-mno-warn-deprecated -mcpu=cortex-a53 -mfloat-abi=hard -mfpu=neon-fp-armv8 -fomit-frame-pointer -Wall -pipe -Os -flto -ffat-lto-objects -march=armv8-a+crc -mtune=cortex-a53 -fuse-ld=gold -fuse-linker-plugin -flto -O3 -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM


    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    aes-128-cbc 52439.59k 61217.63k 65165.40k 66189.92k 66688.95k

    VERSUS

    S905 (1G/8G)

    LibreELEC (community) Version: 8.0.2b-temp_sensor_disabled
    LibreELEC git: 41cba91834574a1e3388dcfca8d3444c78d61287
    S905:~ # openssl speed -evp aes-128-cbc
    OpenSSL 1.0.2l 25 May 2017
    options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
    compiler: /home/kszaq/ocz/le-master/build.LibreELEC-S905.arm-8.0-devel/toolchain/bin/armv8a-libreelec-linux-gnueabi-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -march=armv8-a+crc -mabi=aapcs-linux -Wno-psabi -Wa,-mno-warn-deprecated -mcpu=cortex-a53 -mfloat-abi=hard -mfpu=neon-fp-armv8 -fomit-frame-pointer -Wall -pipe -Os -flto -ffat-lto-objects -march=armv8-a+crc -mtune=cortex-a53 -fuse-ld=gold -fuse-linker-plugin -flto -O3 -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM


    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    aes-128-cbc 52157.89k 61173.33k 65310.31k 66193.07k 66719.59k

  • That's odd. Here's my results on s905x -

    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes

    aes-128-cbc 142601.09k 401685.67k 728886.88k 929721.00k 1011881.30k

    Edit: This is on 8.0.2b

  • Thanks, kszag. A simple 'less /proc/cpuinfo' found the cause. The 'S905X box' I used for testing actually turns out to be an S905:

    Processor : AArch64 Processor rev 4 (aarch64)
    processor : 0
    processor : 1
    processor : 2
    processor : 3
    Features : fp asimd evtstrm crc32 wp half thumb fastmult vfp edsp neon vf pv3 tlsi vfpv4 idiva idivt
    CPU implementer : 0x41
    CPU architecture: 8
    CPU variant : 0x0
    CPU part : 0xd03
    CPU revision : 4
    Hardware : Amlogic
    Serial : 1f0c13003209a3a13e932786eb086f51
    Revision : 020c

    I will dig out the two remaining S905X boxes and check if they are indeed S905X or not. And thanks, Jaaxx, for confirming the 100.000k+ performance on your end. I need the S905X's extra crypto performance and will bring in a new - and true S905X - box, if needed.

    Edit: Also verified with this...

    S905X:~ # fw_printenv

    [...]
    hostname=arm_gxbb

    [...]

  • OK, so I thought it relevant to get around to posting an update.

    Running a 2G/8G S905X box now (Nexbox A95X) on LE 8.0.2e:

    Kodi:~ # less /proc/cpuinfo

    Processor : AArch64 Processor rev 4 (aarch64)
    Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 wp half thumb fastmult vfp edsp neon vfpv3 tlsi vfpv4 idiva idivt
    Hardware : Amlogic
    Revision : 020a

    S905X:~ # fw_printenv

    [...]

    aml_dt=gxl_p212_2g
    [...]

    And here are the performance benchmarks for the box:

    Kodi:~ # openssl speed -evp aes-128-cbc
    Doing aes-128-cbc for 3s on 16 size blocks: 22739647 aes-128-cbc's in 2.98s
    Doing aes-128-cbc for 3s on 64 size blocks: 15875682 aes-128-cbc's in 2.99s
    Doing aes-128-cbc for 3s on 256 size blocks: 7178216 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 2298816 aes-128-cbc's in 2.99s
    Doing aes-128-cbc for 3s on 8192 size blocks: 313328 aes-128-cbc's in 2.99s
    OpenSSL 1.0.2l 25 May 2017
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    aes-128-cbc 122092.06k 339813.93k 612541.10k 787286.82k 858455.84k

    So, still not a 100K+ score for 8192 bytes, as kszaq and Jaaxx, which is a bit odd and unexpected, I guess, but an absolutely acceptable performance with more than 300% throughput increase over the 8.0.1h on a 1G/8G S905 (non-X).

    EDIT:

    The S905 (non-X) on 8.0.2e is closing the gap to S905X crypto performance, for those interested in a comparison:


    amlogic:~$ openssl speed -evp aes-128-cbc
    OpenSSL 1.1.0f 25 May 2017
    The 'numbers' are in 1000s of bytes per second processed.

    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
    aes-128-cbc 43428.43k 61313.79k 68540.84k 70619.48k 71245.82k 71286.78k

    Edited once, last by fxfxfx: Added S905 performance (July 9, 2017 at 11:12 AM).