Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5537

ptlrpc_send_reply(): ASSERTION( req->rq_no_reply == 0 ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.7.0
    • Lustre 2.7.0
    • None
    • 3
    • 15412

    Description

      The following assertion failure was seen on an OSS:

      Aug 19 17:32:08 lola-2 kernel: Lustre: ost: This server is not able to keep up with request traffic (cpu-bound).
      Aug 19 17:32:08 lola-2 kernel: Lustre: 5309:0:(service.c:1509:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=0, svcEst=30, delay=0(jiff)
      Aug 19 17:32:08 lola-2 kernel: Lustre: 5309:0:(service.c:1306:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending earl
      y reply. Consider increasing at_early_margin (5)?  req@ffff880415a7b050 x1476487418415744/t0(0) o400->d8ca812e-ca2b-b357-39ed-b1b134fb6dbd@19
      2.168.1.126@o2ib1:0/0 lens 224/0 e 586846 to 0 dl 1408494727 ref 2 fl Complete:H/c0/ffffffff rc 0/-1
      Aug 19 17:32:09 lola-2 kernel: Lustre: soaked-OST0000: Client 87e86655-cbf2-ba09-92c2-7853a9b2c942 (at 192.168.1.119@o2ib1) reconnecting, wai
      ting for 14 clients in recovery for 1:27
      Aug 19 17:32:09 lola-2 kernel: LustreError: 5366:0:(ldlm_lib.c:2689:target_bulk_io()) @@@ timeout on bulk GET after 0+0s  req@ffff88083a61b40
      0 x1476486691018500/t0(4300509964) o4->8dda3382-83f8-6445-5eea-828fd59e4a06@192.168.1.116@o2ib1:0/0 lens 504/448 e 391470 to 0 dl 1408494729
      ref 2 fl Complete:/4/0 rc 0/0
      Aug 19 17:32:09 lola-2 kernel: LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) ASSERTION( req->rq_no_reply == 0 ) failed:
      Aug 19 17:32:09 lola-2 kernel: Lustre: soaked-OST0000: Bulk IO write error with 8dda3382-83f8-6445-5eea-828fd59e4a06 (at 192.168.1.116@o2ib1)
      , client will retry: rc -110
      Aug 19 17:32:09 lola-2 kernel: LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) LBUG
      Aug 19 17:32:09 lola-2 kernel: Pid: 5432, comm: ll_ost_io03_003
      Aug 19 17:32:09 lola-2 kernel:
      Aug 19 17:32:09 lola-2 kernel: Call Trace:
      Aug 19 17:32:09 lola-2 kernel: [<ffffffffa0641895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      Aug 19 17:32:09 lola-2 kernel: [<ffffffffa0641e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      Aug 19 17:32:09 lola-2 kernel: [<ffffffffa09cda4c>] ptlrpc_send_reply+0x4ec/0x7f0 [ptlrpc]
      Aug 19 17:32:09 lola-2 kernel: [<ffffffffa09d4aae>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]
      Aug 19 17:32:09 lola-2 kernel: [<ffffffffa09e4d75>] ptlrpc_at_check_timed+0xcd5/0x1370 [ptlrpc]
      Aug 19 17:32:09 lola-2 kernel: [<ffffffffa09dc1e9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
      Aug 19 17:32:09 lola-2 kernel: [<ffffffffa09e66f8>] ptlrpc_main+0x12e8/0x1990 [ptlrpc]
      Aug 19 17:32:09 lola-2 kernel: [<ffffffff81069290>] ? pick_next_task_fair+0xd0/0x130
      Aug 19 17:32:09 lola-2 kernel: [<ffffffff81529246>] ? schedule+0x176/0x3b0
      Aug 19 17:32:09 lola-2 kernel: [<ffffffffa09e5410>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
      Aug 19 17:32:09 lola-2 kernel: [<ffffffff8109abf6>] kthread+0x96/0xa0
      Aug 19 17:32:09 lola-2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
      Aug 19 17:32:09 lola-2 kernel: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      Aug 19 17:32:09 lola-2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      It appears to be a race between a BRW timeout and an attempt to send an early reply.

      Attachments

        Activity

          People

            liwei Li Wei (Inactive)
            liwei Li Wei (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: