[LU-12347] lustre write: do not enqueue rpc holding osc/mdc ldlm lock held - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.15.0
Affects Version/s: None
Labels:
- patch

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

lustre’s write should not send enqueue rpc to mds while having osc or mdc ldlm lock held. This may happen currently via:

    cl_io_loop
      cl_io_lock                    <- ldlm lock is taken here
      cl_io_start
        vvp_io_write_start
        ...
          __generic_file_aio_write
            file_remove_privs
              security_inode_need_killpriv
              ...
                ll_xattr_get_common
                ...
                  mdc_intent_lock   <- enqueue rpc is sent here
      cl_io_unlock                  <- ldlm lock is released

That may lead to client eviction. The following scenario has been observed during write load with DoM involved:

write holds mdc ldlm lock (L1) and is waiting on free rpc slot in
obd_get_request_slot trying to do ll_xattr_get_common().
all the rpc slots are busy by write processes which wait for enqueue
rpc completion.
mds in order to serve the enqueue requests has sent blocking ast for
the lock L1 and eventually evicts the client as it does not cancel
L1.

There has been observed another more complex scenario caused by this problem. Clients get evicted by osts during mdtest+ior+failover hw testing.

Attachments

Issue Links

is related to

LU-15639 replay-dual test_31 error: set_param: param_path 'at_max': No such file or directory

Open

Activity

People

Assignee:: Vladimir Saveliev

Reporter:: Vladimir Saveliev

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 28/May/19 1:22 PM

Updated:: 05/Jul/23 6:03 PM

Resolved:: 30/Nov/21 1:41 PM