Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
Sometimes I can see the max RPC in flight way over my setting in osc.*.max_rpcs_in_flight=8 from rpc_stats, as follows:
When I checked osc.*.rpc_stats, I saw the inflight RPCs are way over:
read write rpcs in flight rpcs % cum % | rpcs % cum % 1: 0 0 0 | 1 0 0 2: 0 0 0 | 16 0 0 3: 0 0 0 | 2 0 1 4: 0 0 0 | 3 0 1 5: 0 0 0 | 9 0 1 6: 0 0 0 | 41 2 4 7: 0 0 0 | 132 7 11 8: 0 0 0 | 305 17 28 9: 0 0 0 | 505 28 56 10: 0 0 0 | 363 20 77 11: 0 0 0 | 195 10 88 12: 0 0 0 | 107 5 94 13: 0 0 0 | 59 3 97 14: 0 0 0 | 30 1 99 15: 0 0 0 | 11 0 99 16: 0 0 0 | 5 0 99 17: 0 0 0 | 1 0 100
This is true for both direct and cached IO.
After some investigation, it turned out that the RPC in flight accounting is racy: https://github.com/lustre/lustre-release/blob/master/lustre/osc/osc_cache.c#L2214
It took the lock to check max_rpcs_in_flight, and release the lock, and then make the RPC. So it's possible that two or more threads would enter into the section and make RPCs simultaneously, which makes the max_rpcs_in_flight exceed the setting value.