Shilong,
I'm not on the ext4 list so I won't comment there, but that code should almost certainly use ext4_fs_is_busy rather than counting attempts directly.
Separately, did you try that patch on the current upstream kernel?
Is the problem spinlock contention (most time spent in lock/unlock, not just waiting for the lock - a problem which is fixed in newer kernels) or is it actually waiting for the lock? (Most time spent waiting for lock, because it really is held.)
Anyway, if you haven't, you should try this on a newer kernel - Spinlock contention as a peformance problem is more or less fixed with queued spinlocks. (Multiple waiters for a spinlock now has minimal performance impact on lock/unlock, whereas in earlier kernels, multiple waiters cause locking and unlocking to take many times longer. It won't fix the problem of having to wait for the lock, but removes lock contention itself as the cause of performance issues.) RHEL7 doesn't have them, sadly.
If your problem really is spinlock contention, it's not going to show up on newer kernels. We might still want the patch for RHEL7.
I see recent mods in master have landed for this issue, but only for RHEL 7.x
Is this not needed for RHEL 6.x, SLES 11/12, Ubuntu ?