Anyway, if this is the culprit
Unfortunately, it seems not to be. I turned on debug for PM/regulators and it's interesting to see with ondemand/schedutil it is bumping the clock/voltages probably every 30-40ms, generally that's a huge part of the debug logging going on. The other piece is the mmc, and naturally things that are not logged. The only thing pinging more in debug is the thermal side of it.
Not really sure what else it could be, I think I feel confident it is related to DVFS, and I have doubts it would be a spinlock/mutex or anything in that realm since it would definitely be more prevalent. I am assuming under/over voltage or the rate of switching the clock is throwing something off in the SoC and probably causing it to "crash". What's interesting is I have caught it where it is responsive and all I do on the terminal is hit previous command history (up arrow) on the shell and it will freeze, other times it is frozen when I get there and the shell is disconnected, device doesn't ping, UI is non-responsive, serial console is non-responsive. The device is solid on the performance governor.
Not really sure if there is any way to debug it further, other than maybe a JTAG? I set clock-latency as well, to see if that has any bearing on it (thinking it would slow the rate of changing the clock, couldn't really tell if it is waiting for the clock to be adjusted or not).
If that doesn't work out, then probably just going to set the performance governor.