Microbenchmarks against a previous version only means they've made a relative improvement.
The benchmarks that would matter to us (or at least me) are:
1. how does it compare to an equivalent implementation without having to maintain insertion order.
2. how does it hold up under stress (larger data sets, with heavy load where gc/compaction have to occur)
Obviously none of this should matter to you as you've said your data load is low with no growth. So bobs your uncle.