[LU-3698] sec.c:1060:sptlrpc_cli_unwrap_reply() ASSERTION( req->rq_repdata == ((void *)0) ) failed - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.5.0
Affects Version/s: Lustre 2.5.0
Labels:
None

Severity:
3
Rank (Obsolete):
9544

Description

As seen on a Hyperion client iwc1

There was serious memory pressure lots of allocation were failing.

Build:
http://build.whamcloud.com/job/lustre-master/1594/

Lustre: Lustre: Build Version: jenkins-arch=x86_64,build_type=client,distro=el6,ib_stack=inkernel-1594-gbdf591f-PRISTINE-2.6.32-358.11.1.el6.x86_64

2013-08-05 10:27:53 LustreError: 84566:0:(osc_request.c:2161:osc_build_rpc()) prep_req failed: -12
2013-08-05 10:27:53 LustreError: 84566:0:(osc_request.c:2161:osc_build_rpc()) Skipped 4 previous similar messages
2013-08-05 10:27:53 LustreError: 84566:0:(osc_cache.c:2091:osc_check_rpcs()) Write request failed with -12
2013-08-05 10:27:53 LustreError: 84566:0:(osc_cache.c:2091:osc_check_rpcs()) Skipped 5 previous similar messages
2013-08-05 10:27:53 LustreError: 84566:0:(sec.c:1060:sptlrpc_cli_unwrap_reply()) ASSERTION( req->rq_repdata == ((void *)0) ) failed:
2013-08-05 10:27:53 LustreError: 84566:0:(sec.c:1060:sptlrpc_cli_unwrap_reply()) LBUG
2013-08-05 10:27:53 Pid: 84566, comm: ptlrpcd_7
2013-08-05 10:27:53
2013-08-05 10:27:53 Call Trace:
2013-08-05 10:27:53  [<ffffffffa056e895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
2013-08-05 10:27:53  [<ffffffffa056ee97>] lbug_with_loc+0x47/0xb0 [libcfs]
2013-08-05 10:27:53  [<ffffffffa0c52256>] sptlrpc_cli_unwrap_reply+0x1d6/0x240 [ptlrpc]
2013-08-05 10:27:53  [<ffffffffa0c1735f>] after_reply+0x6f/0xd90 [ptlrpc]
2013-08-05 10:27:53  [<ffffffffa057efb1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
2013-08-05 10:27:53  [<ffffffffa0c1c941>] ptlrpc_check_set+0xfd1/0x1b20 [ptlrpc]
2013-08-05 10:27:53  [<ffffffffa0c4752b>] ptlrpcd_check+0x53b/0x560 [ptlrpc]
2013-08-05 10:27:53  [<ffffffffa0c47a43>] ptlrpcd+0x223/0x380 [ptlrpc]
2013-08-05 10:27:53  [<ffffffff81063310>] ? default_wake_function+0x0/0x20
2013-08-05 10:27:53  [<ffffffffa0c47820>] ? ptlrpcd+0x0/0x380 [ptlrpc]
2013-08-05 10:27:53  [<ffffffff81096936>] kthread+0x96/0xa0
2013-08-05 10:27:53  [<ffffffff8100c0ca>] child_rip+0xa/0x20
2013-08-05 10:27:53  [<ffffffff810968a0>] ? kthread+0x0/0xa0
2013-08-05 10:27:53  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

There should be crashdump for this that I will take a look at if needed.

I will do a little more digging and further updating of this LU it looks like we are not handling the ENOMEM (-12) quite right in this code path.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

iwc1-console
06/Aug/13 2:35 AM
229 kB
Keith Mannthey

Activity

People

Assignee:: Keith Mannthey (Inactive)

Reporter:: Keith Mannthey (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 05/Aug/13 6:53 PM

Updated:: 24/Sep/13 4:58 AM

Resolved:: 24/Sep/13 4:58 AM