Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3698

sec.c:1060:sptlrpc_cli_unwrap_reply() ASSERTION( req->rq_repdata == ((void *)0) ) failed

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.5.0
    • Lustre 2.5.0
    • None
    • 3
    • 9544

    Description

      As seen on a Hyperion client iwc1

      There was serious memory pressure lots of allocation were failing.

      Build:
      http://build.whamcloud.com/job/lustre-master/1594/

      Lustre: Lustre: Build Version: jenkins-arch=x86_64,build_type=client,distro=el6,ib_stack=inkernel-1594-gbdf591f-PRISTINE-2.6.32-358.11.1.el6.x86_64
      
      2013-08-05 10:27:53 LustreError: 84566:0:(osc_request.c:2161:osc_build_rpc()) prep_req failed: -12
      2013-08-05 10:27:53 LustreError: 84566:0:(osc_request.c:2161:osc_build_rpc()) Skipped 4 previous similar messages
      2013-08-05 10:27:53 LustreError: 84566:0:(osc_cache.c:2091:osc_check_rpcs()) Write request failed with -12
      2013-08-05 10:27:53 LustreError: 84566:0:(osc_cache.c:2091:osc_check_rpcs()) Skipped 5 previous similar messages
      2013-08-05 10:27:53 LustreError: 84566:0:(sec.c:1060:sptlrpc_cli_unwrap_reply()) ASSERTION( req->rq_repdata == ((void *)0) ) failed:
      2013-08-05 10:27:53 LustreError: 84566:0:(sec.c:1060:sptlrpc_cli_unwrap_reply()) LBUG
      2013-08-05 10:27:53 Pid: 84566, comm: ptlrpcd_7
      2013-08-05 10:27:53
      2013-08-05 10:27:53 Call Trace:
      2013-08-05 10:27:53  [<ffffffffa056e895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      2013-08-05 10:27:53  [<ffffffffa056ee97>] lbug_with_loc+0x47/0xb0 [libcfs]
      2013-08-05 10:27:53  [<ffffffffa0c52256>] sptlrpc_cli_unwrap_reply+0x1d6/0x240 [ptlrpc]
      2013-08-05 10:27:53  [<ffffffffa0c1735f>] after_reply+0x6f/0xd90 [ptlrpc]
      2013-08-05 10:27:53  [<ffffffffa057efb1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      2013-08-05 10:27:53  [<ffffffffa0c1c941>] ptlrpc_check_set+0xfd1/0x1b20 [ptlrpc]
      2013-08-05 10:27:53  [<ffffffffa0c4752b>] ptlrpcd_check+0x53b/0x560 [ptlrpc]
      2013-08-05 10:27:53  [<ffffffffa0c47a43>] ptlrpcd+0x223/0x380 [ptlrpc]
      2013-08-05 10:27:53  [<ffffffff81063310>] ? default_wake_function+0x0/0x20
      2013-08-05 10:27:53  [<ffffffffa0c47820>] ? ptlrpcd+0x0/0x380 [ptlrpc]
      2013-08-05 10:27:53  [<ffffffff81096936>] kthread+0x96/0xa0
      2013-08-05 10:27:53  [<ffffffff8100c0ca>] child_rip+0xa/0x20
      2013-08-05 10:27:53  [<ffffffff810968a0>] ? kthread+0x0/0xa0
      2013-08-05 10:27:53  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      

      There should be crashdump for this that I will take a look at if needed.

      I will do a little more digging and further updating of this LU it looks like we are not handling the ENOMEM (-12) quite right in this code path.

      Attachments

        1. iwc1-console
          229 kB
          Keith Mannthey

        Activity

          People

            keith Keith Mannthey (Inactive)
            keith Keith Mannthey (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: