I don't know much about MzScheme internals. But from my experience of implementing Gauche Scheme, inlining basic operators such as <, +, and - is a big performance boost, and I suspect MzScheme does similar thing as far as these ops are not redefined. Calling them many times via 'wrapper' function like ac.scm does diminishes that effect.
That said, one possible overhead is that, although arc compiler tries to use ar-funcall2 to avoid consing the arguments, they are consed after all since _+ and _< are defined to take a list of arguments.
The original (time (fib 34)) took 79.8s on my machine.
Changing arc< in ac.scm with this:
(cond [(and (number? a) (number? b)) (< a b)]
[(and (string? a) (string? b)) (string<? a b)]
[(and (symbol? a) (symbol? b)) (string<? (symbol->string a)
[(and (char? a) (char? b)) (char<? a b)]
[else (< a b)])]
(cond ((all number? args) (apply < args))
((all string? args) (pairwise string<? args #f))
((all symbol? args) (pairwise (lambda (x y)
(string<? (symbol->string x)
((all char? args) (pairwise char<? args #f))
(#t (apply < args)))]))
made (time (fib 34)) 72.8s, and further changing _+ with this:
(cond [(and (string? a) (string? b)) (string-append a b)]
[(and (arc-list? a) (arc-list? b))
(ac-niltree (append (ar-nil-terminate a)
[else (+ a b)])]
(cond [(all string? args)
(apply string-append args)]
[(all arc-list? args)
(ac-niltree (apply append (map ar-nil-terminate args)))]
[else (apply + args)])]))
made (time (fib 34)) 49.5s.
But generally, these kind of tuning for microbenchmarks only have small effect in real applications, for microbenchmarks magnifies inefficiency of very specific parts.
(Afterthought: It won't be too difficult to write a Scheme macro that expands variable-arity type-dispatching functions like _+ or _+ into case-lambda form. Then this kind of optimization can be applied without cluttering the source too much.)