[LU-1094] general protection fault in _debug_req() Created: 10/Feb/12 Updated: 30/Apr/12 Resolved: 30/Apr/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Ned Bass | Assignee: | Oleg Drokin |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | paj | ||
| Environment: | |||
| Severity: | 3 |
| Rank (Obsolete): | 6461 |
| Description |
|
We had five occurrences of this crash on OSS nodes in our classified Lustre 2.1 cluster. Timeframe coincided with LustreError: 14210:0:(genops.c:1270:class_disconnect_stale_exports()) ls5-OST0349: disconnect stale client [UUID]@<unknown> general protection fault: 0000 1 SMP machine_kexec |
| Comments |
| Comment by Ned Bass [ 10/Feb/12 ] |
|
Comment copied from I did some digging in crash to see what state the ptlrpc_reqeust was in. I dug up the pointer address from the backtrace (let's call it <addr1> to save typing). Then resolving some of the strings that get passed to libcfs_debug_vmsg2() from _debug_req(), I see: crash> struct ptlrpc_request.rq_import <addr1>
rp_import = 0x0
crash> struct ptlrpc_request.rq_export <addr1>
rp_export = <addr2>
crash> struct obd_export.exp_connection <addr2>
exp_connection = 0x5a5a5a5a5a5a5a5a
crash> struct obd_export.exp_client_uuid <addr2>
exp_client_uuid = {
uuid = "ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ"
}
So the presence of poison value and bogus uuid suggests this export has already been destroyed. For reference, here a snippet from from _debug_req() that uses these values: 2271 void _debug_req(struct ptlrpc_request *req,
2272 struct libcfs_debug_msg_data *msgdata,
2273 const char *fmt, ... )
2274 {
2275 va_list args;
2276 va_start(args, fmt);
2277 libcfs_debug_vmsg2(msgdata, fmt, args,
2278 " req@%p x"LPU64"/t"LPD64"("LPD64") o%d->%s@%s:%d/%d"
2279 " lens %d/%d e %d to %d dl "CFS_TIME_T" ref %d "
2280 "fl "REQ_FLAGS_FMT"/%x/%x rc %d/%d\n",
2281 req, req->rq_xid, req->rq_transno,
2282 req->rq_reqmsg ? lustre_msg_get_transno(req->rq_reqmsg) : 0,
2283 req->rq_reqmsg && req_ptlrpc_body_swabbed(req) ?
2284 lustre_msg_get_opc(req->rq_reqmsg) : -1,
2285 req->rq_import ? obd2cli_tgt(req->rq_import->imp_obd) :
2286 req->rq_export ?
2287 (char*)req->rq_export->exp_client_uuid.uuid : "<?>",
|
| Comment by Oleg Drokin [ 11/Feb/12 ] |
|
I think this one also has a chance of being related to lu-106, so let's see if the runs with the patch would help. |
| Comment by Ned Bass [ 18/Apr/12 ] |
|
FYI, we did in fact hit this again with the |
| Comment by Ned Bass [ 18/Apr/12 ] |
|
Sorry, disregard previous comment. We hit a new GPF, not this one. |
| Comment by Peter Jones [ 30/Apr/12 ] |
|
Believed to be a duplicate of |