"I also think there are extensive unit tests in Nu"
Older versions of Nu had lots of unit tests, yeah, but I haven't ported them over to the latest version of Nu yet and probably won't do so anytime soon, as I've unfortunately lost the motivation to work on anything Arc related.
---
"Nu's benchmarks are meant to be runnable on multiple Arc implementations for comparison purposes, so they could be a good start."
Yeah, but for now all the Arc implementations need to be built on top of Racket, so it works for ar, Arc 3.1, Nu, etc. but not, say, Rainbow. Getting it to work with non-Racket processes is probably doable - albeit difficult - and it would come at the cost of accuracy in the tests.
The accuracy problem could be mitigated with some sort of namespace system such as the one Nu could have if I ever actually built the damn thing. But in that case, you might actually be better off building the benchmark tester program in C/D/Go/Racket/whatever and using FFI to talk to the different implementations...