Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
3
-
9223372036854775807
Description
lustre’s write should not send enqueue rpc to mds while having osc or mdc ldlm lock held. This may happen currently via:
cl_io_loop cl_io_lock <- ldlm lock is taken here cl_io_start vvp_io_write_start ... __generic_file_aio_write file_remove_privs security_inode_need_killpriv ... ll_xattr_get_common ... mdc_intent_lock <- enqueue rpc is sent here cl_io_unlock <- ldlm lock is released
That may lead to client eviction. The following scenario has been observed during write load with DoM involved:
- write holds mdc ldlm lock (L1) and is waiting on free rpc slot in
obd_get_request_slot trying to do ll_xattr_get_common(). - all the rpc slots are busy by write processes which wait for enqueue
rpc completion. - mds in order to serve the enqueue requests has sent blocking ast for
the lock L1 and eventually evicts the client as it does not cancel
L1.
There has been observed another more complex scenario caused by this problem. Clients get evicted by osts during mdtest+ior+failover hw testing.
Attachments
Issue Links
- is related to
-
LU-15639 replay-dual test_31 error: set_param: param_path 'at_max': No such file or directory
- Open