Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8102

Correlate ptlrpc request with AST error

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • None
    • 9223372036854775807

    Description

      With the current ldlm_handle_ast_error() code it can be a little difficult to correlate AST errors with the actual ptlrpc request. e.g. when I see a message like:

      00010000:00020000:4.0:1462315066.553137:0:95408:0:(ldlm_lockd.c:673:ldlm_handle_ast_error()) ### client (nid 102@gni) failed to reply to blocking AST (req status 0 rc -11), evict it ns: filter-snx11155-OST0002_UUID lock: ffff88073e146180/0xe5dd239afc59ee37 lrc: 4/0,0 mode: PW/PW res: [0x743f4fd:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x60000000000020 nid: 102@gni remote: 0xc162787564b54e4d expref: 41 pid: 107493 timeout: 7223452176 lvb_type: 0
      

      It is not always straightforward to figure out which ptlrpc request contained the blocking AST being referenced. If I have dlmtrace and rpctrace I can usually go back in the logs to look at the thread which sent out the AST/handled the lock request and correlate based on time stamps or something. But, ldlm_handle_ast_error() has the ptlrpc_request struct as one of its arguments, so we can easily enhance the debug messages to include some extra info such as the address of the request and its xid.

      Attachments

        Activity

          People

            wc-triage WC Triage
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: