Radxa Zero mmc issues

  • Running mainline U-Boot 2022.01 from Armbian on a Radxa Zero, the box image fails to boot because LZO decompression is not enabled in mainline on this board. I enabled LZO, rebuilt U-Boot, and now it looks like LibreELEC boots from SD, goes to resize the filesystem and ends up with file system write errors & corruption.

    Tried 3 different SD cards, from 2 different brands. So I guess either the SD slot is highly marginal, or perhaps an issue in LE's DT? I'll have to play with it some more from Armbian and see if that also causes filesystem corruption. Armbian is loaded into the eMMC and is running 5.16.9 kernel from mainline.

    Testing from Armbian to SD cards:

    kernel: [ 269.339082] mmc0: error -84 writing Cache Enable bit

    kernel: [ 269.339107] mmc0: tried to HW reset card, got error -84

    kernel: [ 269.378798] I/O error, dev mmcblk0, sector 2080 op 0x1:(WRITE) flags 0x100000 phys_seg 87 prio class 0

    kernel: [ 269.378827] Buffer I/O error on dev mmcblk0p1, logical block 32, lost async page write

    kernel: [ 269.378841] Buffer I/O error on dev mmcblk0p1, logical block 33, lost async page write

    kernel: [ 269.378846] Buffer I/O error on dev mmcblk0p1, logical block 34, lost async page write

    kernel: [ 269.378852] Buffer I/O error on dev mmcblk0p1, logical block 35, lost async page write

    kernel: [ 269.378857] Buffer I/O error on dev mmcblk0p1, logical block 36, lost async page write

    kernel: [ 269.378862] Buffer I/O error on dev mmcblk0p1, logical block 37, lost async page write

    kernel: [ 269.378867] Buffer I/O error on dev mmcblk0p1, logical block 38, lost async page write

    kernel: [ 269.378872] Buffer I/O error on dev mmcblk0p1, logical block 39, lost async page write

    kernel: [ 269.378890] Buffer I/O error on dev mmcblk0p1, logical block 40, lost async page write

    kernel: [ 269.378896] Buffer I/O error on dev mmcblk0p1, logical block 41, lost async page write

    kernel: [ 269.380164] I/O error, dev mmcblk0, sector 17216 op 0x1:(WRITE) flags 0x100000 phys_seg 87 prio class 0

    kernel: [ 269.405032] FAT-fs (mmcblk0p1): error, fat_get_cluster: invalid cluster chain (i_pos 484866)

    kernel: [ 269.405056] FAT-fs (mmcblk0p1): Filesystem has been set read-only

    kernel: [ 269.405065] FAT-fs (mmcblk0p1): error, fat_free_clusters: deleting FAT entry beyond EOF

    kernel: [ 269.625336] FAT-fs (mmcblk0p1): error, fat_get_cluster: invalid cluster chain (i_pos 484866)


    So it looks like both are broken, hardware is broken, or it's the kernel since they are both on 5.16. There is also similar initialization errors (false starts) during detection, shows up 2x in dmesg:

    [ 2.465881] mmc0: error -84 reading general info of SD ext reg

    [ 2.465914] mmc0: error -84 whilst initialising SD card

    Then eventually detected:

    [ 4.643469] mmc0: new high speed SDHC card at address 59b4

    [ 4.649821] mmcblk0: mmc0:59b4 LX32G 29.5 GiB

    [ 4.651880] mmcblk0: p1

    Edited once, last by frakkin64 (February 12, 2022 at 11:53 PM).

  • I've moved this into a separate thread so it's easier to track the discussion. Can you share diff patches or clearer instructions on what you've enabled or disabled for LZO?

  • I've moved this into a separate thread so it's easier to track the discussion. Can you share diff patches or clearer instructions on what you've enabled or disabled for LZO?

    This is what was done for LZO on U-Boot, just enabling it on the defconfig. Armbian is just using straight mainline with minimal patches, so that issue is in upstream (It's hit or miss, some boards have it enabled and some do not in upstream).

    Diff
    diff --git a/configs/radxa-zero_defconfig b/configs/radxa-zero_defconfig
    index a9afb64ae06..bd3027a9c33 100644
    --- a/configs/radxa-zero_defconfig
    +++ b/configs/radxa-zero_defconfig
    @@ -63,3 +63,4 @@ CONFIG_VIDEO_DT_SIMPLEFB=y
     CONFIG_SPLASH_SCREEN=y
     CONFIG_SPLASH_SCREEN_ALIGN=y
     CONFIG_OF_LIBFDT_OVERLAY=y
    +CONFIG_LZO=y

    With that patch applied, it will decompress the kernel, and things look fine. As I mentioned both Armbian and LE have the same issue with the SD card, so I am leaning towards hardware/defective unit/marginal design. I guess the only way to rule that out would be to try vendor U-Boot and a vendor kernel.

  • None of the Amlogic configs have it set by default but it shouldn't be an issue for LE images due to https://github.com/LibreELEC/Libr…otloader/config which is applied at build time.

    Yeah, haven't tested the u-boot built with LE. I was using the box image on SD, booting from Armbian U-Boot. The other two (LZ4, LZMA) appear to be defaults when run through u-boot's Kconfig already, but LZO is defaulted to no/off. I doubt it's an u-boot issue, because I would expect once the kernel is up then u-boot shouldn't play a role anymore.

    I think Radxa might have a build with legacy kernels, probably worth testing with that. If that doesn't work, then I am going to assume it is a SD issue.

  • Re-loaded the vendor U-Boot (Android bootloader), and popped in CE on a SD card and well everything is totally fine. So it seems like it may be a mainline issue.

    Tested more kernels with Armbian:

    5.16.9 bad

    5.15.18 bad

    5.15.8 bad

    Tested the Radxa image for Ubuntu Focal (with vendor kernel):

    5.10.69-10-amlogic-g617a45dd0fce ok

    So it appears to be a mainline issue, here is where I am at -- working on bisecting to find the offending commit:

    5.10.100 ok

    5.11 ok

    5.12 ok

    I'll update in the new few days to see if I can isolate it. The test is pretty simple, write 1GB to the SD card with the offending kernel and you should get failures with "dd if=/dev/zero of=/mnt/data bs=8192 count=131072" and some of the stuff above in dmesg.

    Edited 2 times, last by frakkin64 (February 13, 2022 at 11:38 PM).

  • Got to the bottom of it, it is actually a DT issue.

    5.16.9 works fine with this DT:

    build/arm64-dts-amlogic-add-support-for-Radxa-Zero.patch at master · armbian/build
    Armbian Linux build framework. Contribute to armbian/build development by creating an account on GitHub.
    github.com

    Same kernel (literally the same one, just swapped DTB) with this DT causes SD corrupting and HW resets on SD (This is the one that should be in stable 5.16.y, I believe, because Armbian kicks it out):

    build/arm64-dts-amlogic-add-support-for-Radxa-Zero-0001.patch at master · armbian/build
    Armbian Linux build framework. Contribute to armbian/build development by creating an account on GitHub.
    github.com

    Not quite sure why yet. Didn't see anything directly touching &sd_emmc_b, but didn't spend a lot of time looking at the full DT. Also repeated the test with 5.13 (which oddly has Wifi issues) and the 2 DTBs and found the same result.

  • I did a quick diff compare of the two Armbian patches and the first one you've linked is an early draft of Zero support based on a mix of the SEI510 and U200 device-trees. The second one looks like the final merged version which contains some differences to regulator naming and ensuring HDMI has power from boot.

    G12A and G12B silicon has an mmc bug that requires a workaround/quirk, but that's in the g12-common dtsi so all devices will inherit it. The issue hits SDIO wifi in some older kernels but has been resolved upstream for a while now. See the history for drivers/mmc/host/meson-gx-mmc.c and note that I have some patches in my tree 5.16.y tree that are not upstream (the 270º patch is allegedly not needed but seems to benefit non-BCM modules like QCA9377).

    NB: I have the impression that Armbian support for Amlogic isn't the best due to everyone on staff deliberately trying to look the other way and avoid the noise from unsupported "TV Box" users and board vendors not funding support. The net result seems to be a mixed bag of patches from various places and a generic defconfig. I respond to Q's when asked but I don't follow their development.

  • NB: I have the impression that Armbian support for Amlogic isn't the best due to everyone on staff deliberately trying to look the other way and avoid the noise from unsupported "TV Box" users and board vendors not funding support. The net result seems to be a mixed bag of patches from various places and a generic defconfig. I respond to Q's when asked but I don't follow their development.

    Yeah, I am not super impressed with how the patches are managed. It's a hot-podge of diff's & am/format-patch styles, which of course am hates, so it's a PITA to pull it into Git and rebase the patches.

    So this same issue affects LE booted SDs, which means it's running an LE kernel & LE DTB. Mostly using Armbian kernels as a host because it is working & installed on the eMMC. So it seems like whatever the issue is related to the latest DT. So do you think there is a missing patch with LE as well?

    Edit: I actually rebuilt 5.16.9 with only patches for bootsplash/fb & patches from LE, same result. I think I also tested 5.16.9 mainline with just bootslash/fb patches, and had the same result.

    Edited once, last by frakkin64 (February 14, 2022 at 12:32 PM).

  • I was able to reduce the problem to this in the DT:

  • Okay.. I dropped that from the USB-C addition patch. That patch is a placeholder at the moment as (according to Neil) there's a polarity issue not handled in the fusb302 code that prevents the driver from probing. It's greek to me, might mean more to you?

  • Okay.. I dropped that from the USB-C addition patch. That patch is a placeholder at the moment as (according to Neil) there's a polarity issue not handled in the fusb302 code that prevents the driver from probing. It's greek to me, might mean more to you?

    I have pretty minimal electronics background, took a class back in high school from some pretty awful teachers that didn't explain anything. :) I did look at the schematics, and what I found interesting is the GPIO is a 3.3Vdc power domain and it is connected to the enable pin of the regulator with a external 10k pull-up resistor to 5Vdc, so I thought that was odd. But I can't say the schematics are accurate or complete. Since the regulator enable pin has an external pull up to 5Vdc, it should be on at boot and I believe the GPIO configuration in the DT is purely for power management (being able to drive the enable bit to ground via the GPIO).

    What I didn't understand is why it caused issues with the SD card. The SD card does use a 3.3Vdc supply, so a bit lost on why that is. According to the schematics, this is the right GPIO pin.