That's absolutely true. Reaching C speed with high level languages such as Lisp it's very very difficult. CMUCL and SBCL reach roughly the speed of C, but they've been developed for a long time.
As of loops speed vs. tail recursion speed, the difference shouldn't be too big.
Stalin performs as better as C in numerical programs and many other benchmarks. The most exciting thing is that unlike CL, stalin doesn't need type declarations to guide optimizations. It would infer as much type information as possible. The problems is that it compiles too slow and it's not maintained anymore.
Naively implemented tail recursions is still not fast, because many common loop optimizations can't be directly applied to them unless you eliminate the function calls and regard them as true goto's. It's a rough task because the global flow analysis is needed for eliminating as many calls as we can.