[LU-8102] Correlate ptlrpc request with AST error Created: 04/May/16 Updated: 15/Jun/16 Resolved: 15/Jun/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Chris Horn | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
With the current ldlm_handle_ast_error() code it can be a little difficult to correlate AST errors with the actual ptlrpc request. e.g. when I see a message like: 00010000:00020000:4.0:1462315066.553137:0:95408:0:(ldlm_lockd.c:673:ldlm_handle_ast_error()) ### client (nid 102@gni) failed to reply to blocking AST (req status 0 rc -11), evict it ns: filter-snx11155-OST0002_UUID lock: ffff88073e146180/0xe5dd239afc59ee37 lrc: 4/0,0 mode: PW/PW res: [0x743f4fd:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x60000000000020 nid: 102@gni remote: 0xc162787564b54e4d expref: 41 pid: 107493 timeout: 7223452176 lvb_type: 0 It is not always straightforward to figure out which ptlrpc request contained the blocking AST being referenced. If I have dlmtrace and rpctrace I can usually go back in the logs to look at the thread which sent out the AST/handled the lock request and correlate based on time stamps or something. But, ldlm_handle_ast_error() has the ptlrpc_request struct as one of its arguments, so we can easily enhance the debug messages to include some extra info such as the address of the request and its xid. |
| Comments |
| Comment by Gerrit Updater [ 04/May/16 ] |
|
Chris Horn (hornc@cray.com) uploaded a new patch: http://review.whamcloud.com/19983 |
| Comment by Gerrit Updater [ 14/Jun/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19983/ |
| Comment by Joseph Gmitter (Inactive) [ 15/Jun/16 ] |
|
patch has landed to master for 2.9.0 |