Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
3
-
9223372036854775807
Description
lustre’s write should not send enqueue rpc to mds while having osc or mdc ldlm lock held. This may happen currently via:
cl_io_loop
cl_io_lock <- ldlm lock is taken here
cl_io_start
vvp_io_write_start
...
__generic_file_aio_write
file_remove_privs
security_inode_need_killpriv
...
ll_xattr_get_common
...
mdc_intent_lock <- enqueue rpc is sent here
cl_io_unlock <- ldlm lock is released
That may lead to client eviction. The following scenario has been observed during write load with DoM involved:
- write holds mdc ldlm lock (L1) and is waiting on free rpc slot in
obd_get_request_slot trying to do ll_xattr_get_common(). - all the rpc slots are busy by write processes which wait for enqueue
rpc completion. - mds in order to serve the enqueue requests has sent blocking ast for
the lock L1 and eventually evicts the client as it does not cancel
L1.
There has been observed another more complex scenario caused by this problem. Clients get evicted by osts during mdtest+ior+failover hw testing.
Attachments
Issue Links
- is related to
-
LU-15639
replay-dual test_31 error: set_param: param_path 'at_max': No such file or directory
-
- Resolved
-