-march or TARGET_CPU - where to set it.

  • I tried "export", I tried it on the make line (PROJECT=Generic ARCH=x86_64 TARGET_CPU=silvermont make image), I searched this forum.....

    The only way I could make it "stick" is to edit the config/arch.x86_64 file and add it after the first logic block.

    I'm a noob...is there a doc with user-config variables like TARGET_CPU and how to set them for LibreElec?

    thanks!

  • This is wrong, CPU optimization does not break binary compatibility. I have always used -O2 in my builds and for my NUC I use -march=haswell.

    The performance benefit is traditionally around 10% or even more in some cases. Definitively worthwhile for slightly larger binary size.

  • thanks - I added the TARGET_CPU="silvermont" line in projects/Generic/option just after the first case statement.

    This change is for devices based-on the Intel CherryTrail processor.
    In my case, Amazon.com: Tronsmart Ara X5 Plus Windows 10 TV Box Cherry Trail Z8300 Quad Core 1.8G Gen 8 Graphics GPU 2G/32G 2.4Ghz/5Ghz LAN HDMI H.265 XBMC BT4.0 USB3.0: Computers & Accessories

    Using Windows on this device is a crime against nature, and should be punished with
    an endless loop of the ORIGINAL Wolfenstein theme during your allocated sleep duration.
    :@

  • yes that change TARGET_CPU="x86-64" to TARGET_CPU="silvermont" is correct but
    then find line PROJECT_CFLAGS="-mmmx -msse -msse2 -mfpmath=sse" and change to PROJECT_CFLAGS=""

    Edited once, last by piotrasd (January 4, 2017 at 8:04 AM).


  • yes that change TARGET_CPU="x86-64" to TARGET_CPU="silvermont" is correct but
    then find line PROJECT_CFLAGS="-mmmx -msse -msse2 -mfpmath=sse" and change to PROJECT_CFLAGS=""

    Why would you blank out $PROJECT_CFLAGS?

    In my personal build, I replaced

    TARGET_CPU="x86-64"
    with
    TARGET_CPU="silvermont"

    and

    PROJECT_CFLAGS="-mmmx -msse -msse2 -mfpmath=sse"
    with
    PROJECT_CFLAGS="-mmovbe -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mpopcnt -maes -mpclmul -mrdrnd -mfpmath=sse"

    So far all addons function. In fact I have noticed slightly lower memory usage, roughly 20-23% on average. However, I should state that I did not perform riguorus double-blind tests.

    Oddly enough, before I implemented these changes Chromium produced choppy output from NetFlix videos. After the custom changes, Chromium produced better output. There is still a degree of dropped frames, but only during highly dynamic scenes (fast action on high quality videos). Which makes me wonder about recompiling Chromium with the same customizations...

    Edited once, last by erythros (May 3, 2017 at 9:52 PM).

  • Are you using the TronSmart ARA5? Device type matters with all those optimizations, some of which are actually defaults regardless of compiler options.
    For instance -msse and -msse2 are enabled by default when compiling 64bit.
    -march=native might even work better than enabling every version of sse, which seems redundant.

    Optimizations like -O1, -O2, or -O3 might also improve size, memory, speed, etc.

  • I know this thread is very old, but for completeness and because this page is the first search result for "libreelec silvermont" on google

    TARGET_CPU="silvermont" already enables MMX, SSE and so on

    x86-options.html

    mfpmath=sse is default for X86-64 and higher

    i386-and-x86_002d64-options.html

    So piotrasd tip should be the best option for compiling for silvermonts or are there other reasons why redundant PROJECT_CFLAGS should be set?

    And where is the best place to set -O2 or -O3 flag for LibreELEC? PROJECT_CFLAGS?

    Edited 2 times, last by zehner (September 20, 2018 at 4:17 PM).

  • FYI - One can get gcc to share specific flags for hardware like this:

    Code
    gcc -c -Q -march=native --help=target

    For more conveniently, put them into a line line:

    Code
    gcc '-###' -E - -march=native 2>&1 | sed -r '/cc1/!d;s/(")|(^.* - )//g'
  • Okay so first thank you all for this useful and educating information. I have a few questions and I am sorry in advance for the level of detail and the resulting length of this post:

    yes that change TARGET_CPU="x86-64" to TARGET_CPU="silvermont" is correct but
    then find line PROJECT_CFLAGS="-mmmx -msse -msse2 -mfpmath=sse" and change to PROJECT_CFLAGS=""

    Okay so this refers to ~/http://LibreELEC.tv/projects/Generic/options which contains by default TARGET_CPU="x86-64" and TARGET_FEATURES="64bit" (does this actually set -m64 in the CFLAGS?) which from my understanding I should set to TARGET_CPU="ivybridge" but should I blank out TARGET_FEATURES="64bit" to TARGET_FEATURES=""? A little bit further down there is PROJECT_CFLAGS="" by default and then there is also

    if [ -z "$TARGET_CPU" ]; then

    TARGET_CPU=core2

    which can be found in ~/http://LibreELEC.tv/config/arch.x86_64 which seems to be referring to the former and further down there are TARGET_CFLAGS="-march=$TARGET_CPU -m64 -mmmx -msse -msse2 -mfpmath=sse" (should this be blanked out?) and the last line says TARGET_FEATURES+=" mmx sse sse2". Lastly there is ~/http://LibreELEC.tv/config/optimize which is set by default to GCC_OPTIM="-Os" which I could change to "-O2" or if I'm feeling really brave to "-O3", "-Os -O3" (is this syntax correct?) or maybe even "-Ofast", a little bit further down you'll find TARGET_CFLAGS="$TARGET_CFLAGS -fomit-frame-pointer" where $TARGET_CFLAGS is determined by whatever ARCH has been defined earlier if any, otherwise the default would be whatever flags come as standard for the core2 ARCH and/or instruction set.

    Also from what I've seen in the configs and scripts all of these things seem that they could possibly be defined by a command line option. Am I summarizing everthing correctly or what is the correct way to do it if I don't care if the code won't run on any other machine than mine because of the high level of optimizations?

    Final question, is it possible to use Profile Guided Optimization (PGO) as for example Link Time Optimization (LTO) is compatible with most packages (apart from some Intel Graphics Driver stuff perhaps) from my understanding?

    FYI - One can get gcc to share specific flags for hardware like this:

    Code
    gcc -c -Q -march=native --help=target

    According to this I get the following CFLAGS (that is if I was to use them all) for my Ivy Bridge Core i7 3632QM:

    click for super long CFLAGS

    -m128bit-long-double -m64 -m80387 -mabi=sysv -maddress-mode=long -maes

    -malign-data=compat -malign-functions=0 -malign-jumps=0 -malign-loops=0

    -malign-stringops -march=ivybridge -masm=att -mavx -mavx256-split-unaligned-load

    -mavx256-split-unaligned-store -mbranch-cost=3 -mcmodel=default -mcx16 -mf16c

    -mfancy-math-387 -mfp-ret-in-387 -mfpmath=sse -mfsgsbase -mfunction-return=keep

    -mfxsr -mglibc -mhard-float -mieee-fp -mincoming-stack-boundary=0

    -mindirect-branch=keep -minstrument-return=none -mlarge-data-threshold=65536

    -mlong-double-80 -mmmx -mpclmul -mpopcnt -mprefer-vector-width=none

    -mpreferred-stack-boundary=0 -mpush-args -mrdrnd -mred-zone -mregparm=6 -msahf

    -msse -msse2 -msse3 -msse4 -msse4.1 -msse4.2 -mssse3 -mstack-protector-guard=tls

    -mstringop-strategy=default -mstv -mtls-dialect=gnu -mtls-direct-seg-refs

    -mtune=ivybridge -mveclibabi=default -mvzeroupper -mxsave -mxsaveopt

    For more conveniently, put them into a line line:

    Code
    gcc '-###' -E - -march=native 2>&1 | sed -r '/cc1/!d;s/(")|(^.* - )//g'

    This one gives me kind of the same output but takes the approach of what it can't do rather then what it can do in the first one. It also seems to involve some super safe/standard compliant stuff that seems to completely ignore -march=ivybridge and -O3 (using -march=native, -O2 and some PC98 kind of stuff) and yet it still fails to build:

    click for super long CFLAGS

    -march=ivybridge -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16

    -msahf -mno-movbe -maes -mno-sha -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma

    -mno-fma4 -mno-xop -mno-bmi -mno-sgx -mno-bmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm

    -mavx -mno-avx2 -msse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mrdrnd -mf16c

    -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f

    -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt

    -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma

    -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero

    -mno-pku -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni -mno-vaes

    -mno-vpclmulqdq -mno-avx512bitalg -mno-movdiri -mno-movdir64b -mno-waitpkg -mno-cldemote

    -mno-ptwrite --param l1-cache-size=32 --param l1-cache-line-size=64

    --param l2-cache-size=6144 -mtune=ivybridge -fasynchronous-unwind-tables

    -fstack-protector-strong -Wformat -Wformat-security -fstack-clash-protection -fcf-protection

    I also used the Safe_CFLAGS guide and the GCC_optimization guide from the Gentoo Wiki. Whilst this is a completely different distribution I still find it very relatable as it describes the process of building a Linux OS from source and it is much more detailed and better explained than the equivalent on the GCC website.

    I also used the cpuid2cpuflags tool mentioned in the Gentoo Wiki which I had to compile as there is no Ubuntu/Debian .DEB package available as it seems. This turned out to be my first and only successful build on Ubuntu so far:

    cpuid2cpuflags which had aes avx f16c mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3 as its output.


    Then I used the standard cpuid tool from Ubuntu:

    cpuid -1 -i which is the internal standard output and apparently it gives more reliable information and it doesn't need sudo.

    Otherwise sudo cpuid -1 -k could be used which gets its information from the kernel is therefore is supposed to be less reliable.

    As for my build system I am using a Lenovo ThinkPad Edge E531 with 8GB RAM and Ubuntu Studio 20.04 devel branch (I know, but still better than M$ I suppose) with GCC 9.3.0-10ubuntu1 and Python 2.7.17 from the pythonispython2 package from the Ubuntu Focal repo. I have also tried 18.04 LTS Server and even the 19.04 mini.iso under Virtualbox but this was painfully slow, consumed a lot more memory (well no surprise with an additional OS to feed I guess) and limited my access to the data.

    Also a good resource are the product specifications and the Datasheet Vol. 1 and Vol. 2.

    Now you might probably ask yourself why I am going so deep into detail and what I am trying to achieve considering it will probably lead to a lot of failed builds, lots of frustration, a lot of time wasted and all that for a fat build and some lousy 10% performance increase (if I'm lucky, might also go the other way) that I don't really need in the first place with my 720p telly and everything working just fine as it is in LE 9.2.2 stable. Well I am absolutely new to this and by doing this I hope to learn how to heavily optimize code, how it is being affected by the different flags and what I can get away with and what I should rather stay away from. I have plans for future projects and I hope to gain some knowledge about building from source in general and now during these crazy Corona curfew days there's not much to do anyway I guess. Also maybe there is someone out there who might find this thread a useful resource./shrug