Ah, sorry for my misunderstanding. 
So then it seems your problem is "inode stealing", where another thread uses the inode bit/number in the group before you can do it. So you end up contending for that lock because you're having to try over and over... (Which also is why it makes some sense that you count attempts rather than look directly at lock contention.)
I don't know what the comments on the list have been (it looks like silence so far), but it really bothers me to see "insert an arbitrary timer delay" as the solution here. That doesn't seem very future proof. Isn't there something we can do directly about the stealing? Increase lock coverage, change how we find the bit, set it to in use before we do all the testing and unset it if the testing fails, that sort of thing? It's hard to say exactly what would be safe.
Because it looks like the current problem is every thread is doing work for every inode, but only one is really getting to use the work it does. So perhaps we should lock around the find_next_zero_bit and set the bit there. That complicates the error path, but it seems like it would (mostly) guarantee forward progress for each thread. Perhaps that's not safe for other reasons, perhaps we can't set that bit (even temporarily) until we know the other things we check... I don't know...
But inserting a timed sleep...
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28276/
Subject:
LU-9796kernel: improve metadata performaces for RHEL7Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 17fe3c192e101ace75b2f4d7f7e9ff7d8d85480e