[LU-5537] ptlrpc_send_reply(): ASSERTION( req->rq_no_reply == 0 ) failed Created: 22/Aug/14 Updated: 05/Jun/15 Resolved: 25/Nov/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Li Wei (Inactive) | Assignee: | Li Wei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 15412 |
| Description |
|
The following assertion failure was seen on an OSS: Aug 19 17:32:08 lola-2 kernel: Lustre: ost: This server is not able to keep up with request traffic (cpu-bound). Aug 19 17:32:08 lola-2 kernel: Lustre: 5309:0:(service.c:1509:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=0, svcEst=30, delay=0(jiff) Aug 19 17:32:08 lola-2 kernel: Lustre: 5309:0:(service.c:1306:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-1s), not sending earl y reply. Consider increasing at_early_margin (5)? req@ffff880415a7b050 x1476487418415744/t0(0) o400->d8ca812e-ca2b-b357-39ed-b1b134fb6dbd@19 2.168.1.126@o2ib1:0/0 lens 224/0 e 586846 to 0 dl 1408494727 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 Aug 19 17:32:09 lola-2 kernel: Lustre: soaked-OST0000: Client 87e86655-cbf2-ba09-92c2-7853a9b2c942 (at 192.168.1.119@o2ib1) reconnecting, wai ting for 14 clients in recovery for 1:27 Aug 19 17:32:09 lola-2 kernel: LustreError: 5366:0:(ldlm_lib.c:2689:target_bulk_io()) @@@ timeout on bulk GET after 0+0s req@ffff88083a61b40 0 x1476486691018500/t0(4300509964) o4->8dda3382-83f8-6445-5eea-828fd59e4a06@192.168.1.116@o2ib1:0/0 lens 504/448 e 391470 to 0 dl 1408494729 ref 2 fl Complete:/4/0 rc 0/0 Aug 19 17:32:09 lola-2 kernel: LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) ASSERTION( req->rq_no_reply == 0 ) failed: Aug 19 17:32:09 lola-2 kernel: Lustre: soaked-OST0000: Bulk IO write error with 8dda3382-83f8-6445-5eea-828fd59e4a06 (at 192.168.1.116@o2ib1) , client will retry: rc -110 Aug 19 17:32:09 lola-2 kernel: LustreError: 5432:0:(niobuf.c:550:ptlrpc_send_reply()) LBUG Aug 19 17:32:09 lola-2 kernel: Pid: 5432, comm: ll_ost_io03_003 Aug 19 17:32:09 lola-2 kernel: Aug 19 17:32:09 lola-2 kernel: Call Trace: Aug 19 17:32:09 lola-2 kernel: [<ffffffffa0641895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Aug 19 17:32:09 lola-2 kernel: [<ffffffffa0641e97>] lbug_with_loc+0x47/0xb0 [libcfs] Aug 19 17:32:09 lola-2 kernel: [<ffffffffa09cda4c>] ptlrpc_send_reply+0x4ec/0x7f0 [ptlrpc] Aug 19 17:32:09 lola-2 kernel: [<ffffffffa09d4aae>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc] Aug 19 17:32:09 lola-2 kernel: [<ffffffffa09e4d75>] ptlrpc_at_check_timed+0xcd5/0x1370 [ptlrpc] Aug 19 17:32:09 lola-2 kernel: [<ffffffffa09dc1e9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc] Aug 19 17:32:09 lola-2 kernel: [<ffffffffa09e66f8>] ptlrpc_main+0x12e8/0x1990 [ptlrpc] Aug 19 17:32:09 lola-2 kernel: [<ffffffff81069290>] ? pick_next_task_fair+0xd0/0x130 Aug 19 17:32:09 lola-2 kernel: [<ffffffff81529246>] ? schedule+0x176/0x3b0 Aug 19 17:32:09 lola-2 kernel: [<ffffffffa09e5410>] ? ptlrpc_main+0x0/0x1990 [ptlrpc] Aug 19 17:32:09 lola-2 kernel: [<ffffffff8109abf6>] kthread+0x96/0xa0 Aug 19 17:32:09 lola-2 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20 Aug 19 17:32:09 lola-2 kernel: [<ffffffff8109ab60>] ? kthread+0x0/0xa0 Aug 19 17:32:09 lola-2 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 It appears to be a race between a BRW timeout and an attempt to send an early reply. |
| Comments |
| Comment by Li Wei (Inactive) [ 03/Sep/14 ] |
| Comment by Gerrit Updater [ 20/Nov/14 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11740/ |
| Comment by Li Wei (Inactive) [ 24/Nov/14 ] |
|
The patch has landed to master; resolving issue. |
| Comment by Jian Yu [ 26/Nov/14 ] |
|
Hi Li Wei, Could you please check whether this issue exists on Lustre b2_5 or not? And if yes, could you please back-port the patch? Thank you! |
| Comment by Li Wei (Inactive) [ 27/Nov/14 ] |
|
I took a closer look at b2_5 and realized the problem does not exist there. |
| Comment by Andriy Skulysh [ 05/Jun/15 ] |
|
It can happen on b2_5 also [2319692.184264] LustreError: 3415:0:(ldlm_lib.c:2724:target_bulk_io()) @@@ Reconnect on bulk PUT req@ffff88043ea04c00 x1499139419601196/t0(0) o3->85f63be7-8ccc-f8bf-ce43-5c0b15598965@273@gni1:0/0 lens 488/432 e 0 to 0 dl 1431173956 ref 1 fl Interpret:/0/0 rc 0/0 [2319692.209262] LustreError: 3415:0:(ldlm_lib.c:2724:target_bulk_io()) Skipped 4 previous similar messages [2319692.219701] Lustre: snx11128-OST003c: Bulk IO read error with 85f63be7-8ccc-f8bf-ce43-5c0b15598965 (at 273@gni1), client will retry: rc -110 [2319692.548327] LustreError: 65406:0:(niobuf.c:545:ptlrpc_send_reply()) ASSERTION( req->rq_no_reply == 0 ) failed: [2319692.559511] LustreError: 65406:0:(niobuf.c:545:ptlrpc_send_reply()) LBUG [2319692.566910] Pid: 65406, comm: ll_ost_io02_008 |