Zam, I agree that you see improved performance after a "reset" due to using outer disk tracks (with higher linear velocity) than inner disk tracks (with lower linear velocity). That is to be expected (up to 50% difference in the papers that I have read).
My real issue is that this mechanism is only useful for benchmarking. It won't help with disks that are used, and it won't help even for disks that are empty but have been in use for some time (without users doing a manual reset).
My suggestions are possible ways that this could be fixed for real_world usage of the filesystem. Doing a reset of the allocator to the beginning of the group will not help under normal usage, since it will have to scan the existing groups first. As you noted, mballoc will skip groups that are totally full, but it will still scan groups that are partly full. At the same time, it will not allocate in these groups if the free space is fragmented (as one would expect in a real filesystem after it has been used for some time).
Conversely, the "elevator" style algorithm used today will only revisit each group after scanning each of the previous groups once. This gives the maximum time to free blocks in those other groups before trying to allocate from them again.
Of course, a better (but more complex) mechanism is to keep an ordered list of groups with large free extents. That allows the allocator to quickly find a group with enough free space without having to scan fragmented groups. Also, if the groups were weighted/ordered so that lower-numbered ones were preferred over higher-numbered groups with an equal amount of free space then you will get good behaviour for both your benchmarks and real-world usage.
For empty filesystems (or filesystems that are filled and emptied between test runs) the allocator would prefer the lower-numbered groups. For in-use filesystems the allocator would also be able to quickly find groups with lots of free space, and while it could be biased to the faster parts of the disk it won't be wasting time re-scanning groups that have fragmented free space.
This has been discussed with the ext4 developers in the past, and I think they would be willing to take this kind of patch upstream as well.
Please don't be discouraged with this work. As you say more than one person might want this code, it just does not seem like the Lustre code base is the correct project to hold the code. linux-ext4 is an active ext4 development list and I truly encourage an RFC submission to it.
If you can get code into mainline the path to other places becomes much easier.