Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
None
-
9223372036854775807
Description
With the current ldlm_handle_ast_error() code it can be a little difficult to correlate AST errors with the actual ptlrpc request. e.g. when I see a message like:
00010000:00020000:4.0:1462315066.553137:0:95408:0:(ldlm_lockd.c:673:ldlm_handle_ast_error()) ### client (nid 102@gni) failed to reply to blocking AST (req status 0 rc -11), evict it ns: filter-snx11155-OST0002_UUID lock: ffff88073e146180/0xe5dd239afc59ee37 lrc: 4/0,0 mode: PW/PW res: [0x743f4fd:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x60000000000020 nid: 102@gni remote: 0xc162787564b54e4d expref: 41 pid: 107493 timeout: 7223452176 lvb_type: 0
It is not always straightforward to figure out which ptlrpc request contained the blocking AST being referenced. If I have dlmtrace and rpctrace I can usually go back in the logs to look at the thread which sent out the AST/handled the lock request and correlate based on time stamps or something. But, ldlm_handle_ast_error() has the ptlrpc_request struct as one of its arguments, so we can easily enhance the debug messages to include some extra info such as the address of the request and its xid.