Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15550

WBC: retry the batched RPC when the reply buffer is overflowed

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 9223372036854775807

    Description

      Before send the batched RPC, we have no idea about the actual reply buffer size.
      The reply buffer size the client prepared may be smaller than the reply buffer in need, we can grow the reply buffer properly.

      But when the needed reply buffer size is larger than BUT_MAXREPSIZE (1000 * 1024), the server will return -EOVERFLOW error code. At this time, the server only executed the partial sub requests in the batched RPC. The overflowed sub requests will not be executed.

      Thus the client needs a retry mechanism: when found that the reply buffer overflowed, the client will rebuild the batched RPC for the sub requests that not executed on the server, and send to the server to re-executed them again.

      Attachments

        Issue Links

          Activity

            [LU-15550] WBC: retry the batched RPC when the reply buffer is overflowed
            qian_wc Qian Yingjin added a comment -

            Just add a single line in

                                    max = BUT_MAXREPSIZE - req->rq_replen;
                                    if (used_len + msg_len > len)
                                            len = used_len + msg_len;
            
                                    if (len > max)
                                            len += max;
                                    else
                                            len += len;
            
                                    len += 10 * PAGE_SIZE; // +++++++++++
                                    rc = req_capsule_server_grow(&req->rq_pill,
                                                                 &RMF_BUT_REPLY, len);
            

            It caused the panic:

            [ 6059.646103] Max IOV exceeded: 262 should be < 256
            [ 6059.646267] RIP: 0010:lnet_md_build.part.10+0x525/0x790 [lnet]
            [ 6059.646324]  LNetMDBind+0x48/0x380 [lnet]
            [ 6059.646408]  ptl_send_buf+0x144/0x5a0 [ptlrpc]
            [ 6059.646436]  ? ptlrpc_ni_fini+0x60/0x60 [ptlrpc]
            [ 6059.646461]  ptlrpc_send_reply+0x2ad/0x8d0 [ptlrpc]
            [ 6059.646493]  target_send_reply+0x324/0x7d0 [ptlrpc]
            [ 6059.646526]  tgt_request_handle+0xe81/0x1920 [ptlrpc]
            [ 6059.646554]  ptlrpc_server_handle_request+0x31d/0xbc0 [ptlrpc]
            [ 6059.646580]  ? lprocfs_counter_add+0x12a/0x1a0 [obdclass]
            [ 6059.646607]  ptlrpc_main+0xc4e/0x1510 [ptlrpc]
            [ 6059.646610]  ? __schedule+0x2cc/0x700
            [ 6059.646637]  ? ptlrpc_wait_event+0x590/0x590 [ptlrpc]
            [ 6059.646639]  kthread+0x116/0x130
            [ 6059.646640]  ? kthread_flush_work_fn+0x10/0x10
            [ 6059.646641]  ret_from_fork+0x1f/0x40
            [ 6059.646642] ---[ end trace d2884932fb9123c4 ]---
            

            The calculation for max allowed reply buffer size is incorrect.

            qian_wc Qian Yingjin added a comment - Just add a single line in max = BUT_MAXREPSIZE - req->rq_replen; if (used_len + msg_len > len) len = used_len + msg_len; if (len > max) len += max; else len += len; len += 10 * PAGE_SIZE; // +++++++++++ rc = req_capsule_server_grow(&req->rq_pill, &RMF_BUT_REPLY, len); It caused the panic: [ 6059.646103] Max IOV exceeded: 262 should be < 256 [ 6059.646267] RIP: 0010:lnet_md_build.part.10+0x525/0x790 [lnet] [ 6059.646324] LNetMDBind+0x48/0x380 [lnet] [ 6059.646408] ptl_send_buf+0x144/0x5a0 [ptlrpc] [ 6059.646436] ? ptlrpc_ni_fini+0x60/0x60 [ptlrpc] [ 6059.646461] ptlrpc_send_reply+0x2ad/0x8d0 [ptlrpc] [ 6059.646493] target_send_reply+0x324/0x7d0 [ptlrpc] [ 6059.646526] tgt_request_handle+0xe81/0x1920 [ptlrpc] [ 6059.646554] ptlrpc_server_handle_request+0x31d/0xbc0 [ptlrpc] [ 6059.646580] ? lprocfs_counter_add+0x12a/0x1a0 [obdclass] [ 6059.646607] ptlrpc_main+0xc4e/0x1510 [ptlrpc] [ 6059.646610] ? __schedule+0x2cc/0x700 [ 6059.646637] ? ptlrpc_wait_event+0x590/0x590 [ptlrpc] [ 6059.646639] kthread+0x116/0x130 [ 6059.646640] ? kthread_flush_work_fn+0x10/0x10 [ 6059.646641] ret_from_fork+0x1f/0x40 [ 6059.646642] ---[ end trace d2884932fb9123c4 ]--- The calculation for max allowed reply buffer size is incorrect.
            qian_wc Qian Yingjin added a comment -

            The panic hit on Maloo testing also...
            https://testing.whamcloud.com/test_logs/b34c1c0d-4572-43b2-bc87-b4e32ba5e992/show_text
            It should be a bug related to batched statahead.

            qian_wc Qian Yingjin added a comment - The panic hit on Maloo testing also... https://testing.whamcloud.com/test_logs/b34c1c0d-4572-43b2-bc87-b4e32ba5e992/show_text It should be a bug related to batched statahead.

            Is there any stable reproducer for this panic?

            no, just a single hit so far, local testing.

            bzzz Alex Zhuravlev added a comment - Is there any stable reproducer for this panic? no, just a single hit so far, local testing.
            qian_wc Qian Yingjin added a comment -

            Is there any stable reproducer for this panic?

            It seems there are something wrong when calc message buffer size...

            qian_wc Qian Yingjin added a comment - Is there any stable reproducer for this panic? It seems there are something wrong when calc message buffer size...

            please check:

            [ 4604.470775] Lustre: DEBUG MARKER: == sanity test 123f: Retry mechanism with large wide striping files ========================================================== 15:59:19 (1683388759)
            [ 4668.109917] Lustre: lustre-OST0000-osc-MDT0000: update sequence from 0x280000bd0 to 0x2800013a0
            [ 4668.697696] Lustre: lustre-OST0001-osc-MDT0000: update sequence from 0x2c0000bd0 to 0x2c00013a0
            [ 4791.809264] Lustre: lustre-OST0001-osc-MDT0000: update sequence from 0x2c00013a0 to 0x2c00013a1
            [ 4791.859391] Lustre: lustre-OST0000-osc-MDT0000: update sequence from 0x2800013a0 to 0x2800013a1
            [ 4809.801406] ------------[ cut here ]------------
            [ 4809.801558] Max IOV exceeded: 257 should be < 256
            [ 4809.801678] WARNING: CPU: 0 PID: 116874 at /home/lustre/master-mine/lnet/lnet/lib-md.c:257 lnet_md_build.part.1+0x4b3/0x770 [lnet]
            [ 4809.801847] Modules linked in: lustre(O) ofd(O) osp(O) lod(O) ost(O) mdt(O) mdd(O) mgs(O) osd_zfs(O) lquota(O) lfsck(O) obdecho(O) mgc(O) mdc(O) lov(O) osc(O) lmv(O) fid(O) fld(O) ptlrpc(O) obdclass(O) ksocklnd(O) lnet(O) libcfs(O) zfs(O) zunicode(O) zzstd(O) zlua(O) zcommon(O) znvpair(O) zavl(O) icp(O) spl(O) [last unloaded: libcfs]
            [ 4809.802263] CPU: 0 PID: 116874 Comm: mdt_out00_000 Tainted: G        W  O     --------- -  - 4.18.0 #2
            [ 4809.802389] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
            [ 4809.802492] RIP: 0010:lnet_md_build.part.1+0x4b3/0x770 [lnet]
            [ 4809.802589] Code: 00 00 48 c7 05 da 55 07 00 00 00 00 00 e8 f5 88 fb ff e9 05 fc ff ff ba 00 01 00 00 89 ee 48 c7 c7 38 49 7d c0 e8 c3 d7 91 d6 <0f> 0b e9 f6 fc ff ff f6 05 4b 38 fd ff 10 49 c7 c4 f4 ff ff ff 0f
            [ 4809.802838] RSP: 0018:ffff972524367c10 EFLAGS: 00010282
            [ 4809.802918] RAX: 0000000000000025 RBX: 0000000000000168 RCX: 0000000000000007
            [ 4809.803028] RDX: 0000000000000007 RSI: 0000000000000022 RDI: ffff97256cbe5450
            [ 4809.803147] RBP: 0000000000000101 R08: 000004cd62abfed5 R09: 0000000000000000
            [ 4809.803257] R10: 0000000000000001 R11: 00000000ffffffff R12: ffff97247bb2a000
            [ 4809.803367] R13: 0000000000000000 R14: ffff972242f00168 R15: ffff972524367c90
            [ 4809.803477] FS:  0000000000000000(0000) GS:ffff97256ca00000(0000) knlGS:0000000000000000
            [ 4809.803587] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [ 4809.803679] CR2: 000055aeee75bf44 CR3: 00000003104c2000 CR4: 00000000000006b0
            [ 4809.803792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
            [ 4809.803910] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
            [ 4809.804020] Call Trace:
            [ 4809.804074]  LNetMDBind+0x3e/0x380 [lnet]
            [ 4809.804193]  ptl_send_buf+0x117/0x570 [ptlrpc]
            [ 4809.804307]  ? at_measured+0x1eb/0x300 [ptlrpc]
            [ 4809.804419]  ? reply_out_callback+0x390/0x390 [ptlrpc]
            [ 4809.804529]  ptlrpc_send_reply+0x2b4/0x8f0 [ptlrpc]
            [ 4809.804647]  target_send_reply+0x343/0x770 [ptlrpc]
            [ 4809.804768]  tgt_request_handle+0x9ca/0x1a40 [ptlrpc]
            [ 4809.804885]  ? lustre_msg_get_transno+0x6f/0xd0 [ptlrpc]
            [ 4809.804998]  ptlrpc_main+0x1784/0x32a0 [ptlrpc]
            [ 4809.805083]  ? __kthread_parkme+0x33/0x90
            [ 4809.805172]  ? ptlrpc_wait_event+0x4b0/0x4b0 [ptlrpc]
            [ 4809.805254]  kthread+0x129/0x140
            [ 4809.805314]  ? kthread_flush_work_fn+0x10/0x10
            [ 4809.805391]  ret_from_fork+0x1f/0x30
            [ 4809.805450] ---[ end trace 02ea061606b4429e ]---
            
            bzzz Alex Zhuravlev added a comment - please check: [ 4604.470775] Lustre: DEBUG MARKER: == sanity test 123f: Retry mechanism with large wide striping files ========================================================== 15:59:19 (1683388759) [ 4668.109917] Lustre: lustre-OST0000-osc-MDT0000: update sequence from 0x280000bd0 to 0x2800013a0 [ 4668.697696] Lustre: lustre-OST0001-osc-MDT0000: update sequence from 0x2c0000bd0 to 0x2c00013a0 [ 4791.809264] Lustre: lustre-OST0001-osc-MDT0000: update sequence from 0x2c00013a0 to 0x2c00013a1 [ 4791.859391] Lustre: lustre-OST0000-osc-MDT0000: update sequence from 0x2800013a0 to 0x2800013a1 [ 4809.801406] ------------[ cut here ]------------ [ 4809.801558] Max IOV exceeded: 257 should be < 256 [ 4809.801678] WARNING: CPU: 0 PID: 116874 at /home/lustre/master-mine/lnet/lnet/lib-md.c:257 lnet_md_build.part.1+0x4b3/0x770 [lnet] [ 4809.801847] Modules linked in: lustre(O) ofd(O) osp(O) lod(O) ost(O) mdt(O) mdd(O) mgs(O) osd_zfs(O) lquota(O) lfsck(O) obdecho(O) mgc(O) mdc(O) lov(O) osc(O) lmv(O) fid(O) fld(O) ptlrpc(O) obdclass(O) ksocklnd(O) lnet(O) libcfs(O) zfs(O) zunicode(O) zzstd(O) zlua(O) zcommon(O) znvpair(O) zavl(O) icp(O) spl(O) [last unloaded: libcfs] [ 4809.802263] CPU: 0 PID: 116874 Comm: mdt_out00_000 Tainted: G W O --------- - - 4.18.0 #2 [ 4809.802389] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 4809.802492] RIP: 0010:lnet_md_build.part.1+0x4b3/0x770 [lnet] [ 4809.802589] Code: 00 00 48 c7 05 da 55 07 00 00 00 00 00 e8 f5 88 fb ff e9 05 fc ff ff ba 00 01 00 00 89 ee 48 c7 c7 38 49 7d c0 e8 c3 d7 91 d6 <0f> 0b e9 f6 fc ff ff f6 05 4b 38 fd ff 10 49 c7 c4 f4 ff ff ff 0f [ 4809.802838] RSP: 0018:ffff972524367c10 EFLAGS: 00010282 [ 4809.802918] RAX: 0000000000000025 RBX: 0000000000000168 RCX: 0000000000000007 [ 4809.803028] RDX: 0000000000000007 RSI: 0000000000000022 RDI: ffff97256cbe5450 [ 4809.803147] RBP: 0000000000000101 R08: 000004cd62abfed5 R09: 0000000000000000 [ 4809.803257] R10: 0000000000000001 R11: 00000000ffffffff R12: ffff97247bb2a000 [ 4809.803367] R13: 0000000000000000 R14: ffff972242f00168 R15: ffff972524367c90 [ 4809.803477] FS: 0000000000000000(0000) GS:ffff97256ca00000(0000) knlGS:0000000000000000 [ 4809.803587] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4809.803679] CR2: 000055aeee75bf44 CR3: 00000003104c2000 CR4: 00000000000006b0 [ 4809.803792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4809.803910] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4809.804020] Call Trace: [ 4809.804074] LNetMDBind+0x3e/0x380 [lnet] [ 4809.804193] ptl_send_buf+0x117/0x570 [ptlrpc] [ 4809.804307] ? at_measured+0x1eb/0x300 [ptlrpc] [ 4809.804419] ? reply_out_callback+0x390/0x390 [ptlrpc] [ 4809.804529] ptlrpc_send_reply+0x2b4/0x8f0 [ptlrpc] [ 4809.804647] target_send_reply+0x343/0x770 [ptlrpc] [ 4809.804768] tgt_request_handle+0x9ca/0x1a40 [ptlrpc] [ 4809.804885] ? lustre_msg_get_transno+0x6f/0xd0 [ptlrpc] [ 4809.804998] ptlrpc_main+0x1784/0x32a0 [ptlrpc] [ 4809.805083] ? __kthread_parkme+0x33/0x90 [ 4809.805172] ? ptlrpc_wait_event+0x4b0/0x4b0 [ptlrpc] [ 4809.805254] kthread+0x129/0x140 [ 4809.805314] ? kthread_flush_work_fn+0x10/0x10 [ 4809.805391] ret_from_fork+0x1f/0x30 [ 4809.805450] ---[ end trace 02ea061606b4429e ]---
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/46540/
            Subject: LU-15550 ptlrpc: retry mechanism for overflowed batched RPCs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 668f48f87bec3999892ce1daad24b6dba9ae362b

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/46540/ Subject: LU-15550 ptlrpc: retry mechanism for overflowed batched RPCs Project: fs/lustre-release Branch: master Current Patch Set: Commit: 668f48f87bec3999892ce1daad24b6dba9ae362b

            "Yingjin Qian <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/46540
            Subject: LU-15550 ptlrpc: retry mechanism for overflowed batched RPCs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 3a33b10c3169056c5837869482fe96a6ea814a14

            gerrit Gerrit Updater added a comment - "Yingjin Qian <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/46540 Subject: LU-15550 ptlrpc: retry mechanism for overflowed batched RPCs Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3a33b10c3169056c5837869482fe96a6ea814a14

            People

              qian_wc Qian Yingjin
              qian_wc Qian Yingjin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: