Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
None
-
9223372036854775807
Description
With the current ldlm_handle_ast_error() code it can be a little difficult to correlate AST errors with the actual ptlrpc request. e.g. when I see a message like:
00010000:00020000:4.0:1462315066.553137:0:95408:0:(ldlm_lockd.c:673:ldlm_handle_ast_error()) ### client (nid 102@gni) failed to reply to blocking AST (req status 0 rc -11), evict it ns: filter-snx11155-OST0002_UUID lock: ffff88073e146180/0xe5dd239afc59ee37 lrc: 4/0,0 mode: PW/PW res: [0x743f4fd:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x60000000000020 nid: 102@gni remote: 0xc162787564b54e4d expref: 41 pid: 107493 timeout: 7223452176 lvb_type: 0
It is not always straightforward to figure out which ptlrpc request contained the blocking AST being referenced. If I have dlmtrace and rpctrace I can usually go back in the logs to look at the thread which sent out the AST/handled the lock request and correlate based on time stamps or something. But, ldlm_handle_ast_error() has the ptlrpc_request struct as one of its arguments, so we can easily enhance the debug messages to include some extra info such as the address of the request and its xid.
patch has landed to master for 2.9.0