ZTE ZX2AA500 GB Ethernet Transceiver Problems (SOLVED big news for inside)

  • Hi all

    There are out there a number of TV Box that mount this gigabit transceiver instead of the more popular RTL8211F.

    Almost all of them work ok at gigabit speed with the Android stock firmware, while deliver a non working (or insane slow) gigabit connection with the Linux Kernel used by Libreelec and Coreelec.

    It is not the case of some tv box with this PHY but with 100Mbit line transformers that run only 100Mbit

    There are a number of reference to the issue around, like this one:

    T95Z Plus 3/32Gb display and network problems under 8.90.5

    A (bad) workaround is to force the link speed to 100Mbit. Community and myself have already verified that the dts we use is aligned with the android stock one.

    So the only explanation I have is that there are some tweaks in the linux kernel used by the android firmware. I have already tried all the 4 combination of TX delay in the S912 stmmac glue driver, but it does not help, so I am 99% sure that the problem is with the RX delay (RX clock screw) configuration in the PHY.

    Using my company email I have already tried to reach out the tv box manufacturer and ZTE in order to try to gather some info, still waiting if I can succeed.

    There is absolutely no information in internet about this transceiver, also the ZTE website does not even mention that they make transceiver!

    But it is possible that amlogic has some knowledge of this phy. Because of my work I know how this chinese manufacturer work, it is possible that they used this PHY to reduce BOM cost and after having hit design problems, they have asked amlogic for support, so that is why it is possible that they have info on this.

    I know that some of you, in particular @chewitt, have contacts with Amlogic, so maybe it is worth to try to see if they know anything.

    Bye

  • Good point, in fact the RTL8211F init function in the amlogic driver disable eee as well:

    /* we want to disable eee */

    phy_write(phydev, RTL8211F_MMD_CTRL, 0x7);

    phy_write(phydev, RTL8211F_MMD_DATA, 0x3c);

    phy_write(phydev, RTL8211F_MMD_CTRL, 0x4007);

    phy_write(phydev, RTL8211F_MMD_DATA, 0x0);

    Instead looking at the 4.18.x code, I see that now it is managed by MMD access (new standard)

    So, I will now try to backport everything on the amlogic kernel and see if it helps

    Bye

  • So, I have tried to add the MMD standard disable EEE in a custom quick driver for the ZTE chip: no luck unfortunately

    So we need some info about the phy configuration

  • I run iperf3 between the t95z and my access point:

    t95z -------> AP

    Code
    [ ID] Interval           Transfer     Bitrate         Retr
    [  5]   0.00-10.00  sec   254 MBytes   213 Mbits/sec   44             sender
    [  5]   0.00-10.03  sec   252 MBytes   211 Mbits/sec                  receiver

    AP --------> t95z

    Code
    [ ID] Interval           Transfer     Bitrate         Retr
    [  5]   0.00-10.04  sec  46.7 KBytes  38.1 Kbits/sec    9             sender
    [  5]   0.00-10.00  sec  0.00 Bytes  0.00 bits/sec                  receiver

    So, there is a decent speed in TX direction (not even close to 1Gbit but it is probably due to the AP CPU) and just ridiculous in RX direction.

    As far as I know if the problem were with EEE, both direction should be slow (right @chewitt ?).

    If it is so, it is either the PHY initialization that requires some special handling (like Marvell and some RTL) or it is the RX delay. This in my opinion is more probable, because the same ZTE phy works better on other boards.

    In fact the routing of the t95z is crazy: the phy is on the opposite PCB side than the ethernet line transformer and the S912, so the critical line (both ethernet signal and RGMII) must traverse the PCB.

    But in both case we need a reference code, or whatever sorcery, they do in android stock firmware, or the datasheet or application note of the phy.

    In this amlogic could help, on my side I am waiting news from manufacturer, ZTE replied to my company email promising that someone shall provide info, let's see...

  • The one reference to that PHY that I found on Google was on schematics for the Rock64 (a Rockchip device) but marked optional, and TL Lim who makes that board says he has no memory of looking at ZTE parts. I've asked internal to Amlogic as well, but if it's only been used in a handful of devices I wouldn't be too hopeful.

  • Well if you engage good relationship with a chinese manufacturer I think they may know how reach out ZTE and ask for a datasheet (or an application note with registers description). It is the same I have asked to the public ZTE contact, but they mostly support networking devices, anyhow they promised me to forward my question to the correct person.

    One we have the information I can implement the driver for the PHY.

    Anyhow I have some more info. I have found a datasheet of a GMII PHY ksz9031mnx.pdf

    This datasheet explain the MMD access and also has a register description of the standard MMD registers!

    I think the interesting ones are:

    Addr 2h: Reg 4h GMII Control Signal Pad Skrew

    Addr 2h: Reg 8h GMII Clock Pad Skrew

    The ZTE PHY is a RGMII phy so it is possible that there is something different, but now I have a point where start.

    Also, I have seen that in the Amlogic BSP there is a dedicated phy driver directory, with a modified version of the Linux phy driver and a new one.

    None of them are compiled in Libreelec/Coreelec but it won't make any difference because there is no driver for the ZTE.

    Still there is the chance that Amlogic knows something.

    Otherwise the only other option is that the android firmware is doing something in user space, but I don't see any tools installed that can do this

  • Have you tested with a mainline kernel? Nothing found — Yandex.Disk contains Armbian images for GXM using 4.18 which would be easier to work with for development and tinkering purposes. If you're capable of writing network drivers the lack of device-tree for the T95Z shouldn't faze you too much.

    RK kernel team don't have much documentation on that chip; mostly a comment that it's compatible with the RTL8211F part so the only difference might be power or a crystal oscillator. I didn't hear back from AML team.

  • Well, I can write a phylib driver, network driver is way more complicated.

    I can give a look to the mainline, still I should investigate the dts since now it is required to have a valid mdio phy bindings, that is why I want to work also on the amlogic kernel.

    I have written a small application to access MII register via mdio bus, but I need to cross compile with the linaro gcc armhf used by Libreelec, but I am struggling in invoking it due to include directory.

    How can use the libreelec compiler to build (with no Makefile) an external executable? (solved)

    Edited once, last by Menion (August 23, 2018 at 11:00 AM).

  • So, got the dumps of the android and CE phy registers, I see some difference in the vendor custom registers, even in some RW register that in my opinion explain the different behaviour between Linux/Android. The device returns all 0 to the MMD standard registers

    I have tried to align the registers between Android and Linux, some of them are RO, but it does not help (I tried also to reset PHY after having done it). So I think the only option left is to get hands on the datasheet and even better, an application note

  • But if it is so, how we can explain the different behaviour between android stock and linux, even if they run the same amlogic bsp (kernel 3.14)?

    Also, the dma is on top of the stmmac compatible controller, so it is completely "internal" to the S912

    And from the MDIO register dumps, it is clear that the first vendor specific register, MDIO reg 16, is writable and it is different between android and linux, so something has written it, that is not present in linux

  • chewitt do you have some hints for the mainline kernel? I have compared the mainline q201 (and q200) dts with the one we run in LE/CE. Of course they are different, but I don't see anything critical that could explain why it is not booting in mainline on t95z

    Not booting = boot logo and no IP taken on the interface, but I cannot say if the linux subsystem is running behind the boot logo and there is "only" a problem with the DRM and the ethernet controller, since I don't have UART access and honestly I would like to avoid to solder it