Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.5.0
-
None
-
3
-
9544
Description
As seen on a Hyperion client iwc1
There was serious memory pressure lots of allocation were failing.
Build:
http://build.whamcloud.com/job/lustre-master/1594/
Lustre: Lustre: Build Version: jenkins-arch=x86_64,build_type=client,distro=el6,ib_stack=inkernel-1594-gbdf591f-PRISTINE-2.6.32-358.11.1.el6.x86_64
2013-08-05 10:27:53 LustreError: 84566:0:(osc_request.c:2161:osc_build_rpc()) prep_req failed: -12 2013-08-05 10:27:53 LustreError: 84566:0:(osc_request.c:2161:osc_build_rpc()) Skipped 4 previous similar messages 2013-08-05 10:27:53 LustreError: 84566:0:(osc_cache.c:2091:osc_check_rpcs()) Write request failed with -12 2013-08-05 10:27:53 LustreError: 84566:0:(osc_cache.c:2091:osc_check_rpcs()) Skipped 5 previous similar messages 2013-08-05 10:27:53 LustreError: 84566:0:(sec.c:1060:sptlrpc_cli_unwrap_reply()) ASSERTION( req->rq_repdata == ((void *)0) ) failed: 2013-08-05 10:27:53 LustreError: 84566:0:(sec.c:1060:sptlrpc_cli_unwrap_reply()) LBUG 2013-08-05 10:27:53 Pid: 84566, comm: ptlrpcd_7 2013-08-05 10:27:53 2013-08-05 10:27:53 Call Trace: 2013-08-05 10:27:53 [<ffffffffa056e895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 2013-08-05 10:27:53 [<ffffffffa056ee97>] lbug_with_loc+0x47/0xb0 [libcfs] 2013-08-05 10:27:53 [<ffffffffa0c52256>] sptlrpc_cli_unwrap_reply+0x1d6/0x240 [ptlrpc] 2013-08-05 10:27:53 [<ffffffffa0c1735f>] after_reply+0x6f/0xd90 [ptlrpc] 2013-08-05 10:27:53 [<ffffffffa057efb1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 2013-08-05 10:27:53 [<ffffffffa0c1c941>] ptlrpc_check_set+0xfd1/0x1b20 [ptlrpc] 2013-08-05 10:27:53 [<ffffffffa0c4752b>] ptlrpcd_check+0x53b/0x560 [ptlrpc] 2013-08-05 10:27:53 [<ffffffffa0c47a43>] ptlrpcd+0x223/0x380 [ptlrpc] 2013-08-05 10:27:53 [<ffffffff81063310>] ? default_wake_function+0x0/0x20 2013-08-05 10:27:53 [<ffffffffa0c47820>] ? ptlrpcd+0x0/0x380 [ptlrpc] 2013-08-05 10:27:53 [<ffffffff81096936>] kthread+0x96/0xa0 2013-08-05 10:27:53 [<ffffffff8100c0ca>] child_rip+0xa/0x20 2013-08-05 10:27:53 [<ffffffff810968a0>] ? kthread+0x0/0xa0 2013-08-05 10:27:53 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
There should be crashdump for this that I will take a look at if needed.
I will do a little more digging and further updating of this LU it looks like we are not handling the ENOMEM (-12) quite right in this code path.
Attachments
Activity
Resolution | New: Fixed [ 1 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |
Affects Version/s | New: Lustre 2.5.0 [ 10295 ] |
Fix Version/s | New: Lustre 2.5.0 [ 10295 ] |
Assignee | Original: WC Triage [ wc-triage ] | New: Keith Mannthey [ keith ] |
Attachment | New: iwc1-console [ 13329 ] |
Landed for 2.5.0