The problem is the codec (VC-1) which isn't hardware decoded and ffmpeg doesn't support multithreading.
(You typically see one core at 100% with 2 fairly idle) Some 1080p files struggle to keep up. Overclocking could help.
I did profile and found about 30% of the decode time is spent in two loop filter functions.
There's optimised assembly on x86, but arm only has C code. It's something we'd like to improve.