Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17493

restore LDLM cancel on blocking callback

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.14.0, Lustre 2.16.0, Lustre 2.17.0
    • None
    • 3
    • 9223372036854775807

    Description

      In the old days of Catamount on ASCI Red with liblustre running in the Catamount OS that did not have any CPU interrupts. That meant any server-to-client requests (such as DLM lock cancellations) must be handled asynchronously on the client when the application yielded the processor to filesystem administrative tasks.

      In that environment, the server would immediately assume that a DLM lock was cancelled as soon as the AST was sent on a lock with LDLM_FL_CANCEL_ON_BLOCK set on a lock, rather than waiting for the client to reply to the AST and actually cancel the lock. This avoided potentially significant delays for servers granting locks.

      In large clusters, there are some locks that are invariably highly contended (e.g. ROOT/, /home/ or /project directories, either because many clients are holding a read lock and some client wants to modify the directory, or because of conflicting workloads (e.g. "ls -l" or "rm" in a directory (tree) that is actively in use by other clients. If any client holding a contended lock has a problem, for example LU-17453/LU-17476, then other nodes accessing that lock may block for tens or hundreds of seconds until it is cancelled or the client is evicted.

      It would be useful if LDLM_FL_CANCEL_ON_BLOCK was used for such highly-contended resources when requested with LCK_PR mode, so that the server can send asynchronous ASTs to all clients and then cancel the DLM locks rapidly and perform the required operation without getting blocked by unresponsive clients. Any responsive client will receive the AST and not even need to send the cancel RPC, while unresponsive clients are already unlikely to know or care whether the server sent the AST, so they will have an inconsistent local state until they again contact the server (as they already do today).

      This could potentially also be tied into "ls" (readdir()) being able to run with "LDLM_FL_CANCEL_ON_BLOCK" locks, or no DLM locks at all on the directory or inodes. Per comments in LU-3308, POSIX does not require readdir() to be fully cache coherent even among processes on the same node, only that the readdir cache is reset with rewinddir() and close().

      Attachments

        Issue Links

          Activity

            [LU-17493] restore LDLM cancel on blocking callback

            Yes, getting the directory locks with cancel-on-lock would avoid the MDS blocking access to the whole filesystem if e.g. a client holding a lock on the root directory suddenly becomes unresponsive. I think the ROOT/ (or subdirectory mount) directory should always be a candidate for CANCEL_ON_BLOCK, as would other directories that have a large number of lock holders (decided by the MDS).

            We might consider also applying this to all IBITS LCK_PR locks held by the client if it has been evicted more than once within some time period (e.g. 15 minutes)?

            adilger Andreas Dilger added a comment - Yes, getting the directory locks with cancel-on-lock would avoid the MDS blocking access to the whole filesystem if e.g. a client holding a lock on the root directory suddenly becomes unresponsive. I think the ROOT/ (or subdirectory mount) directory should always be a candidate for CANCEL_ON_BLOCK , as would other directories that have a large number of lock holders (decided by the MDS). We might consider also applying this to all IBITS LCK_PR locks held by the client if it has been evicted more than once within some time period (e.g. 15 minutes)?
            squalfof Keguang Xu added a comment - - edited

            Hi adilger, green, how about expand the concept of a “contended directory” here?

            • A large directory containing hundreds of thousands of entries, where an LCK_PR lock held by an "ls" operation could block rm/rename/create operations. In this scenario, the directory isn’t necessarily "hot", and the directory inode.i_size could serve as a useful indicator?
            • A hot directory under heavy access, with tens of concurrent operations. For a small directory, the "ls" shouldn’t take much time; however, "a problematic client holding an LCK_PR lock may cause other nodes to be blocked for tens or even hundreds of seconds until the lock is canceled or the client is evicted". In this case, an adjusted LDLM contention criteria might be applicable. We're not aiming to address less contended directory with a problematic client here, as the impact would be limited to fewer clients.

            A follow-up question: From the discussion in LU-3308, “keeping cached readdir() data after cancellation would allow better performance.” Should be addressed separately in another patch targeting large directories?

            squalfof Keguang Xu added a comment - - edited Hi adilger , green , how about expand the concept of a “contended directory” here? A large directory containing hundreds of thousands of entries, where an LCK_PR lock held by an "ls" operation could block rm/rename/create operations. In this scenario, the directory isn’t necessarily "hot", and the directory inode.i_size could serve as a useful indicator? A hot directory under heavy access, with tens of concurrent operations. For a small directory, the "ls" shouldn’t take much time; however, "a problematic client holding an LCK_PR lock may cause other nodes to be blocked for tens or even hundreds of seconds until the lock is canceled or the client is evicted". In this case, an adjusted LDLM contention criteria might be applicable. We're not aiming to address less contended directory with a problematic client here, as the impact would be limited to fewer clients. A follow-up question: From the discussion in LU-3308 , “keeping cached readdir() data after cancellation would allow better performance.” Should be addressed separately in another patch targeting large directories?
            pjones Peter Jones made changes -
            Assignee Original: WC Triage [ wc-triage ] New: Keguang Xu [ squalfof ]

            "kg.xu <squalfof@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/59862
            Subject: LU-17493 mdc: restore LDLM cancel on blocking callback
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 44bcc5a5c51f01fc71880361de2545fa6dec5dae

            gerrit Gerrit Updater added a comment - "kg.xu <squalfof@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/59862 Subject: LU-17493 mdc: restore LDLM cancel on blocking callback Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 44bcc5a5c51f01fc71880361de2545fa6dec5dae

            Hi squalfof, it would be prudent to limit this behavior to readdir looks to start, since they have specific "weak" semantics under POSIX, so pre-emptively cancelling them from the server does not introduce any consistency issues.

            It would also be good to get input from green on this topic, since he is the expert in this area.

            adilger Andreas Dilger added a comment - Hi squalfof , it would be prudent to limit this behavior to readdir looks to start, since they have specific "weak" semantics under POSIX, so pre-emptively cancelling them from the server does not introduce any consistency issues. It would also be good to get input from green on this topic, since he is the expert in this area.
            squalfof Keguang Xu added a comment -

            Hi @Andreas, some questions need your help to clarify, 
            1. The scope of this issue is "ls" only?
            2. The definition of "contended" could be found in LDLM. For ANY directory conforms to "contended" should we apply the cancel logic? Or we just cancel ANY lock tagged with BLOCK_ON_CANCEL?

            Thanks.

            squalfof Keguang Xu added a comment - Hi @Andreas, some questions need your help to clarify,  1. The scope of this issue is "ls" only? 2. The definition of "contended" could be found in LDLM. For ANY directory conforms to "contended" should we apply the cancel logic? Or we just cancel ANY lock tagged with BLOCK_ON_CANCEL? Thanks.
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-18759 [ LU-18759 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-17446 [ LU-17446 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-16564 [ LU-16564 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to DDN-4644 [ DDN-4644 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to DDN-4654 [ DDN-4654 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-11509 [ LU-11509 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-3308 [ LU-3308 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LR-9 [ LR-9 ]

            People

              squalfof Keguang Xu
              adilger Andreas Dilger
              Votes:
              2 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: