It's not often the case, but when eval is right, it's right.
Good work! Looks like you have all the angles covered in your point-by-point breakdown at the end, and evaluating is really all the CL version is doing too. Honestly, the biggest "gotcha" for me is point 2, but you neatly deal with that. The tradeoff is totally worth it for having on-line "redefinition" of the macros you're testing. Just make sure 2 is documented as a caveat, and you're good to go.
Certainly a shorter answer to the whole discussion than the naysaying I launched into. ;)