I spent far too long diagnosing why this worked in one set of hosts and not another. When the vrrp_script fails on one host, it's supposed to move the service to the other host, that's the whole point right?
This was working great in one pair of hosts, but not another. Turns out maths are hard and I'm dumb. The weight of the check script is subtracted from the priority. If priorityA - weight is still greater than priorityB, nothing happens. Someone thought it would be cool to change the weight of one of the hosts from 100 to 150. The weight of the test was only 2, so 148 was still a higher priority than the other host (100).
After changing the priority back to 101 everything started working normally. Alternatively I could have made the weight of the check 51.
Hopefully you didn't waste as much time as I did on that dumb a mistake.