[LU-15550] WBC: retry the batched RPC when the reply buffer is overflowed Created: 11/Feb/22  Updated: 16/Jun/23  Resolved: 01/May/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Improvement Priority: Minor
Reporter: Qian Yingjin Assignee: Qian Yingjin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-15975 Statahead_V1 Features Resolved
is related to LU-16907 sanity test_123f: crashed MDS with Ma... Open
is related to LU-14139 batched statahead processing Resolved
Rank (Obsolete): 9223372036854775807

 Description   

Before send the batched RPC, we have no idea about the actual reply buffer size.
The reply buffer size the client prepared may be smaller than the reply buffer in need, we can grow the reply buffer properly.

But when the needed reply buffer size is larger than BUT_MAXREPSIZE (1000 * 1024), the server will return -EOVERFLOW error code. At this time, the server only executed the partial sub requests in the batched RPC. The overflowed sub requests will not be executed.

Thus the client needs a retry mechanism: when found that the reply buffer overflowed, the client will rebuild the batched RPC for the sub requests that not executed on the server, and send to the server to re-executed them again.



 Comments   
Comment by Gerrit Updater [ 17/Feb/22 ]

"Yingjin Qian <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/46540
Subject: LU-15550 ptlrpc: retry mechanism for overflowed batched RPCs
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3a33b10c3169056c5837869482fe96a6ea814a14

Comment by Gerrit Updater [ 01/May/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/46540/
Subject: LU-15550 ptlrpc: retry mechanism for overflowed batched RPCs
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 668f48f87bec3999892ce1daad24b6dba9ae362b

Comment by Peter Jones [ 01/May/23 ]

Landed for 2.16

Comment by Alex Zhuravlev [ 09/May/23 ]

please check:

[ 4604.470775] Lustre: DEBUG MARKER: == sanity test 123f: Retry mechanism with large wide striping files ========================================================== 15:59:19 (1683388759)
[ 4668.109917] Lustre: lustre-OST0000-osc-MDT0000: update sequence from 0x280000bd0 to 0x2800013a0
[ 4668.697696] Lustre: lustre-OST0001-osc-MDT0000: update sequence from 0x2c0000bd0 to 0x2c00013a0
[ 4791.809264] Lustre: lustre-OST0001-osc-MDT0000: update sequence from 0x2c00013a0 to 0x2c00013a1
[ 4791.859391] Lustre: lustre-OST0000-osc-MDT0000: update sequence from 0x2800013a0 to 0x2800013a1
[ 4809.801406] ------------[ cut here ]------------
[ 4809.801558] Max IOV exceeded: 257 should be < 256
[ 4809.801678] WARNING: CPU: 0 PID: 116874 at /home/lustre/master-mine/lnet/lnet/lib-md.c:257 lnet_md_build.part.1+0x4b3/0x770 [lnet]
[ 4809.801847] Modules linked in: lustre(O) ofd(O) osp(O) lod(O) ost(O) mdt(O) mdd(O) mgs(O) osd_zfs(O) lquota(O) lfsck(O) obdecho(O) mgc(O) mdc(O) lov(O) osc(O) lmv(O) fid(O) fld(O) ptlrpc(O) obdclass(O) ksocklnd(O) lnet(O) libcfs(O) zfs(O) zunicode(O) zzstd(O) zlua(O) zcommon(O) znvpair(O) zavl(O) icp(O) spl(O) [last unloaded: libcfs]
[ 4809.802263] CPU: 0 PID: 116874 Comm: mdt_out00_000 Tainted: G        W  O     --------- -  - 4.18.0 #2
[ 4809.802389] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 4809.802492] RIP: 0010:lnet_md_build.part.1+0x4b3/0x770 [lnet]
[ 4809.802589] Code: 00 00 48 c7 05 da 55 07 00 00 00 00 00 e8 f5 88 fb ff e9 05 fc ff ff ba 00 01 00 00 89 ee 48 c7 c7 38 49 7d c0 e8 c3 d7 91 d6 <0f> 0b e9 f6 fc ff ff f6 05 4b 38 fd ff 10 49 c7 c4 f4 ff ff ff 0f
[ 4809.802838] RSP: 0018:ffff972524367c10 EFLAGS: 00010282
[ 4809.802918] RAX: 0000000000000025 RBX: 0000000000000168 RCX: 0000000000000007
[ 4809.803028] RDX: 0000000000000007 RSI: 0000000000000022 RDI: ffff97256cbe5450
[ 4809.803147] RBP: 0000000000000101 R08: 000004cd62abfed5 R09: 0000000000000000
[ 4809.803257] R10: 0000000000000001 R11: 00000000ffffffff R12: ffff97247bb2a000
[ 4809.803367] R13: 0000000000000000 R14: ffff972242f00168 R15: ffff972524367c90
[ 4809.803477] FS:  0000000000000000(0000) GS:ffff97256ca00000(0000) knlGS:0000000000000000
[ 4809.803587] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4809.803679] CR2: 000055aeee75bf44 CR3: 00000003104c2000 CR4: 00000000000006b0
[ 4809.803792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4809.803910] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4809.804020] Call Trace:
[ 4809.804074]  LNetMDBind+0x3e/0x380 [lnet]
[ 4809.804193]  ptl_send_buf+0x117/0x570 [ptlrpc]
[ 4809.804307]  ? at_measured+0x1eb/0x300 [ptlrpc]
[ 4809.804419]  ? reply_out_callback+0x390/0x390 [ptlrpc]
[ 4809.804529]  ptlrpc_send_reply+0x2b4/0x8f0 [ptlrpc]
[ 4809.804647]  target_send_reply+0x343/0x770 [ptlrpc]
[ 4809.804768]  tgt_request_handle+0x9ca/0x1a40 [ptlrpc]
[ 4809.804885]  ? lustre_msg_get_transno+0x6f/0xd0 [ptlrpc]
[ 4809.804998]  ptlrpc_main+0x1784/0x32a0 [ptlrpc]
[ 4809.805083]  ? __kthread_parkme+0x33/0x90
[ 4809.805172]  ? ptlrpc_wait_event+0x4b0/0x4b0 [ptlrpc]
[ 4809.805254]  kthread+0x129/0x140
[ 4809.805314]  ? kthread_flush_work_fn+0x10/0x10
[ 4809.805391]  ret_from_fork+0x1f/0x30
[ 4809.805450] ---[ end trace 02ea061606b4429e ]---
Comment by Qian Yingjin [ 09/May/23 ]

Is there any stable reproducer for this panic?

It seems there are something wrong when calc message buffer size...

Comment by Alex Zhuravlev [ 09/May/23 ]

Is there any stable reproducer for this panic?

no, just a single hit so far, local testing.

Comment by Qian Yingjin [ 10/May/23 ]

The panic hit on Maloo testing also...
https://testing.whamcloud.com/test_logs/b34c1c0d-4572-43b2-bc87-b4e32ba5e992/show_text
It should be a bug related to batched statahead.

Comment by Qian Yingjin [ 10/May/23 ]

Just add a single line in

                        max = BUT_MAXREPSIZE - req->rq_replen;
                        if (used_len + msg_len > len)
                                len = used_len + msg_len;

                        if (len > max)
                                len += max;
                        else
                                len += len;

                        len += 10 * PAGE_SIZE; // +++++++++++
                        rc = req_capsule_server_grow(&req->rq_pill,
                                                     &RMF_BUT_REPLY, len);

It caused the panic:

[ 6059.646103] Max IOV exceeded: 262 should be < 256
[ 6059.646267] RIP: 0010:lnet_md_build.part.10+0x525/0x790 [lnet]
[ 6059.646324]  LNetMDBind+0x48/0x380 [lnet]
[ 6059.646408]  ptl_send_buf+0x144/0x5a0 [ptlrpc]
[ 6059.646436]  ? ptlrpc_ni_fini+0x60/0x60 [ptlrpc]
[ 6059.646461]  ptlrpc_send_reply+0x2ad/0x8d0 [ptlrpc]
[ 6059.646493]  target_send_reply+0x324/0x7d0 [ptlrpc]
[ 6059.646526]  tgt_request_handle+0xe81/0x1920 [ptlrpc]
[ 6059.646554]  ptlrpc_server_handle_request+0x31d/0xbc0 [ptlrpc]
[ 6059.646580]  ? lprocfs_counter_add+0x12a/0x1a0 [obdclass]
[ 6059.646607]  ptlrpc_main+0xc4e/0x1510 [ptlrpc]
[ 6059.646610]  ? __schedule+0x2cc/0x700
[ 6059.646637]  ? ptlrpc_wait_event+0x590/0x590 [ptlrpc]
[ 6059.646639]  kthread+0x116/0x130
[ 6059.646640]  ? kthread_flush_work_fn+0x10/0x10
[ 6059.646641]  ret_from_fork+0x1f/0x40
[ 6059.646642] ---[ end trace d2884932fb9123c4 ]---

The calculation for max allowed reply buffer size is incorrect.

Generated at Sat Feb 10 03:19:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.