Details

    • New Feature
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0
    • None
    • None
    • 9223372036854775807

    Description

      Limit the rate of ptlrpc requests by uid or gid. By this way, the TBF policy is able to provide QoS control to a user or user group identified by uid or gid.
      .

      Attachments

        Issue Links

          Activity

            [LU-9658] Add QoS for uid and gid in NRS-TBF

            If I am going to apply this patch to b2_10 (2.10.3), is there any other patch I need to pick up also for this patch to work correctly? (This patch was applied to b2_10 cleanly.)

             

            jaylan Jay Lan (Inactive) added a comment - If I am going to apply this patch to b2_10 (2.10.3), is there any other patch I need to pick up also for this patch to work correctly? (This patch was applied to b2_10 cleanly.)  
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27608/
            Subject: LU-9658 ptlrpc: Add QoS for uid and gid in NRS-TBF
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e0cdde123c14729a340cf937cf9580be7c9dd9c1

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27608/ Subject: LU-9658 ptlrpc: Add QoS for uid and gid in NRS-TBF Project: fs/lustre-release Branch: master Current Patch Set: Commit: e0cdde123c14729a340cf937cf9580be7c9dd9c1

            With the latest mater branch, it panic:

            [ 3461.027650] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
            [ 3461.027653] IP: [<ffffffffa0b0af84>] nrs_tbf_ctl+0x194/0x590 [ptlrpc]
            [ 3461.027694] PGD 4faa8067 PUD 3d2c1067 PMD 0
            [ 3461.027696] Oops: 0000 [#1] SMP
            [ 3461.027697] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) loop fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bnep vmw_vsock_vmci_transport vsock dm_mirror dm_region_hash dm_log dm_mod snd_seq_midi snd_seq_midi_event intel_powerclamp iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel snd_ens1371 lrw gf128mul glue_helper ablk_helper cryptd snd_rawmidi
            [ 3461.027713]  snd_ac97_codec ppdev pcspkr vmw_balloon ac97_bus snd_seq snd_seq_device snd_pcm btusb btrtl btbcm btintel bluetooth snd_timer snd rfkill soundcore nfit sg libnvdimm parport_pc parport shpchp vmw_vmci i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel serio_raw ata_piix vmwgfx e1000 mptspi scsi_transport_spi mptscsih mptbase drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm i2c_core libata fjes [last unloaded: libcfs]
            [ 3461.027730] CPU: 0 PID: 8846 Comm: lt-lctl Tainted: G           OE  ------------   3.10.0-514.21.1.el7_lustre.2.9.59_35_gc1d70a4.x86_64 #1
            [ 3461.027731] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
            [ 3461.027732] task: ffff880060cfedd0 ti: ffff88004fa28000 task.ti: ffff88004fa28000
            [ 3461.027733] RIP: 0010:[<ffffffffa0b0af84>]  [<ffffffffa0b0af84>] nrs_tbf_ctl+0x194/0x590 [ptlrpc]
            [ 3461.027756] RSP: 0018:ffff88004fa2bda8  EFLAGS: 00010293
            [ 3461.027757] RAX: 0000000000160014 RBX: 0000000000000021 RCX: 0000000000000016
            [ 3461.027757] RDX: 0000000000000001 RSI: 0000000000000021 RDI: ffff880015acdce0
            [ 3461.027758] RBP: ffff88004fa2bdd0 R08: 0000000000000000 R09: ffff880015acdc00
            [ 3461.027758] R10: ffff88007b619ae0 R11: ffffea00013ea400 R12: 0000000000000001
            [ 3461.027759] R13: ffff8800158ed380 R14: ffff8800157c7800 R15: 0000000000000000
            [ 3461.027760] FS:  00007fb6bcfec740(0000) GS:ffff88007b600000(0000) knlGS:0000000000000000
            [ 3461.027760] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [ 3461.027761] CR2: 0000000000000001 CR3: 000000003f0fe000 CR4: 00000000003407f0
            [ 3461.027777] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
            [ 3461.027785] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
            [ 3461.027786] Stack:
            [ 3461.027786]  ffff8800158ed380 ffff880015acdce0 0000000000000021 0000000000000001
            [ 3461.027787]  0000000000000000 ffff88004fa2be00 ffffffffa0afaa15 0000000000000001
            [ 3461.027788]  0000000000000002 0000000000000021 ffff88004bfe9600 ffff88004fa2be58
            [ 3461.027789] Call Trace:
            [ 3461.027810]  [<ffffffffa0afaa15>] nrs_policy_ctl+0x115/0x2b0 [ptlrpc]
            [ 3461.027837]  [<ffffffffa0afd10f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
            [ 3461.027857]  [<ffffffffa0b0bd15>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x995/0x1020 [ptlrpc]
            [ 3461.027860]  [<ffffffff812012e8>] ? __sb_start_write+0x58/0x110
            [ 3461.027862]  [<ffffffff8126bdbd>] proc_reg_write+0x3d/0x80
            [ 3461.027863]  [<ffffffff811fe82d>] vfs_write+0xbd/0x1e0
            [ 3461.027865]  [<ffffffff8120f24d>] ? putname+0x3d/0x60
            [ 3461.027866]  [<ffffffff811ff34f>] SyS_write+0x7f/0xe0
            [ 3461.027867]  [<ffffffff81697849>] system_call_fastpath+0x16/0x1b
            [ 3461.027868] Code: 83 57 bd ff 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 84 00 00 00 00 00 8b 07 4d 8b 75 58 89 c1 c1 e9 10 66 39 c8 0f 84 f0 03 00 00 <41> 8b 04 24 83 f8 01 0f 84 1f 01 00 00 0f 82 b9 01 00 00 83 f8
            [ 3461.027878] RIP  [<ffffffffa0b0af84>] nrs_tbf_ctl+0x194/0x590 [ptlrpc]
            [ 3461.027897]  RSP <ffff88004fa2bda8>
            [ 3461.027898] CR2: 0000000000000001
            
            qian Qian Yingjin (Inactive) added a comment - With the latest mater branch, it panic: [ 3461.027650] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 [ 3461.027653] IP: [<ffffffffa0b0af84>] nrs_tbf_ctl+0x194/0x590 [ptlrpc] [ 3461.027694] PGD 4faa8067 PUD 3d2c1067 PMD 0 [ 3461.027696] Oops: 0000 [#1] SMP [ 3461.027697] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) loop fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bnep vmw_vsock_vmci_transport vsock dm_mirror dm_region_hash dm_log dm_mod snd_seq_midi snd_seq_midi_event intel_powerclamp iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel snd_ens1371 lrw gf128mul glue_helper ablk_helper cryptd snd_rawmidi [ 3461.027713] snd_ac97_codec ppdev pcspkr vmw_balloon ac97_bus snd_seq snd_seq_device snd_pcm btusb btrtl btbcm btintel bluetooth snd_timer snd rfkill soundcore nfit sg libnvdimm parport_pc parport shpchp vmw_vmci i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel serio_raw ata_piix vmwgfx e1000 mptspi scsi_transport_spi mptscsih mptbase drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm i2c_core libata fjes [last unloaded: libcfs] [ 3461.027730] CPU: 0 PID: 8846 Comm: lt-lctl Tainted: G OE ------------ 3.10.0-514.21.1.el7_lustre.2.9.59_35_gc1d70a4.x86_64 #1 [ 3461.027731] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 [ 3461.027732] task: ffff880060cfedd0 ti: ffff88004fa28000 task.ti: ffff88004fa28000 [ 3461.027733] RIP: 0010:[<ffffffffa0b0af84>] [<ffffffffa0b0af84>] nrs_tbf_ctl+0x194/0x590 [ptlrpc] [ 3461.027756] RSP: 0018:ffff88004fa2bda8 EFLAGS: 00010293 [ 3461.027757] RAX: 0000000000160014 RBX: 0000000000000021 RCX: 0000000000000016 [ 3461.027757] RDX: 0000000000000001 RSI: 0000000000000021 RDI: ffff880015acdce0 [ 3461.027758] RBP: ffff88004fa2bdd0 R08: 0000000000000000 R09: ffff880015acdc00 [ 3461.027758] R10: ffff88007b619ae0 R11: ffffea00013ea400 R12: 0000000000000001 [ 3461.027759] R13: ffff8800158ed380 R14: ffff8800157c7800 R15: 0000000000000000 [ 3461.027760] FS: 00007fb6bcfec740(0000) GS:ffff88007b600000(0000) knlGS:0000000000000000 [ 3461.027760] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3461.027761] CR2: 0000000000000001 CR3: 000000003f0fe000 CR4: 00000000003407f0 [ 3461.027777] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3461.027785] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3461.027786] Stack: [ 3461.027786] ffff8800158ed380 ffff880015acdce0 0000000000000021 0000000000000001 [ 3461.027787] 0000000000000000 ffff88004fa2be00 ffffffffa0afaa15 0000000000000001 [ 3461.027788] 0000000000000002 0000000000000021 ffff88004bfe9600 ffff88004fa2be58 [ 3461.027789] Call Trace: [ 3461.027810] [<ffffffffa0afaa15>] nrs_policy_ctl+0x115/0x2b0 [ptlrpc] [ 3461.027837] [<ffffffffa0afd10f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc] [ 3461.027857] [<ffffffffa0b0bd15>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x995/0x1020 [ptlrpc] [ 3461.027860] [<ffffffff812012e8>] ? __sb_start_write+0x58/0x110 [ 3461.027862] [<ffffffff8126bdbd>] proc_reg_write+0x3d/0x80 [ 3461.027863] [<ffffffff811fe82d>] vfs_write+0xbd/0x1e0 [ 3461.027865] [<ffffffff8120f24d>] ? putname+0x3d/0x60 [ 3461.027866] [<ffffffff811ff34f>] SyS_write+0x7f/0xe0 [ 3461.027867] [<ffffffff81697849>] system_call_fastpath+0x16/0x1b [ 3461.027868] Code: 83 57 bd ff 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 84 00 00 00 00 00 8b 07 4d 8b 75 58 89 c1 c1 e9 10 66 39 c8 0f 84 f0 03 00 00 <41> 8b 04 24 83 f8 01 0f 84 1f 01 00 00 0f 82 b9 01 00 00 83 f8 [ 3461.027878] RIP [<ffffffffa0b0af84>] nrs_tbf_ctl+0x194/0x590 [ptlrpc] [ 3461.027897] RSP <ffff88004fa2bda8> [ 3461.027898] CR2: 0000000000000001
            Teddy Teddy added a comment - - edited

            Hi, Andreas. I analyzed the ptlrpc codes and its processing logic on server side(both mdt and ost) these days. Initializing rq_pill in requests is a feasible solution, although doing this is tiring. I have done the related codes without breaking the original ptlrpc handling logic, and I have uploaded the patch.

            In TBF, I think, we do not need limit every request. Only limiting the requests that have great influence on performance is enough (such as read, write, getattr, setattr etc). For others, (such connect, statfs, quotactl, ping), we might just let them go. Besides, if we need to control every request from client, I think, modifying the request protocol is inevitable and is also favorable in the long run.

            Teddy Teddy added a comment - - edited Hi, Andreas. I analyzed the ptlrpc codes and its processing logic on server side(both mdt and ost) these days. Initializing rq_pill in requests is a feasible solution, although doing this is tiring. I have done the related codes without breaking the original ptlrpc handling logic, and I have uploaded the patch. In TBF, I think, we do not need limit every request. Only limiting the requests that have great influence on performance is enough (such as read, write, getattr, setattr etc). For others, (such connect, statfs, quotactl, ping), we might just let them go. Besides, if we need to control every request from client, I think, modifying the request protocol is inevitable and is also favorable in the long run.

            Changing the protocol means that this feature can likely not land until 2.12, and will not be easily backported to 2.10 or earlier releases.

            I agree that re-using the existing uid/gid fields does add some complexity, but allows this feature to work with existing clients and servers much more easily.

            Why not initialize rq_pill in all cases to avoid this problem?

            adilger Andreas Dilger added a comment - Changing the protocol means that this feature can likely not land until 2.12, and will not be easily backported to 2.10 or earlier releases. I agree that re-using the existing uid/gid fields does add some complexity, but allows this feature to work with existing clients and servers much more easily. Why not initialize rq_pill in all cases to avoid this problem?
            Teddy Teddy added a comment -

            Hi, Peter, in my former patches, I was trying to unpack the uid and gid from ptlrpc request to avoid change the protocol. When I look into the bugs that Mahmoud Hanafi mentioned, I noticed that, ptlrpc_request::rq_pill is initialized in ptlrpc_service_ops::so_hpreq_handler. However not every request use so_hpreq_handler.

            mdt/mdt_mds.c: .so_hpreq_handler = ptlrpc_hpreq_handler,
            mdt/mdt_mds.c: .so_hpreq_handler = NULL,
            mdt/mdt_mds.c: .so_hpreq_handler = NULL,
            mdt/mdt_mds.c: .so_hpreq_handler = NULL,
            mdt/mdt_mds.c: .so_hpreq_handler = NULL,
            mdt/mdt_mds.c: .so_hpreq_handler = NULL,

            ost/ost_handler.c: .so_hpreq_handler = ptlrpc_hpreq_handler,
            ost/ost_handler.c: .so_hpreq_handler = tgt_hpreq_handler,
            ost/ost_handler.c: .so_hpreq_handler = NULL,
            ost/ost_handler.c: .so_hpreq_handler = NULL,

            That means, not every ptlrpc_request:rq_pill has been initialized before it is used in nrs_tbf.c. When the rq_pill is not initialized, using req_capsule_client_get() will trigger a LBUG. Ptlrpc requests from client to MDS, belongs to this case, So it caused a LBUG.
            I think adding the uid and gid to ptlrpc protocol can easily address this issue and it is a better way to keep the logic simpler and easier to understand. Although it will introduce some duplication field in the ptlrpc request, and break the protocol, I think it is worth doing so.
            . Our idea is as follows:
            1. Add uid and gid to struct ptlrpc_body_v3 similar as jobid.
            2. Combine the process logic of uid, gid, jobid together.
            3. Handle uid and gid in nrs_tbf similar as jobid.

            I uploaded the patch at https://review.whamcloud.com/#/c/27608/. And it need further improvement in coding to better merge the codes of jobid and uid/gid.

            Teddy Teddy added a comment - Hi, Peter, in my former patches, I was trying to unpack the uid and gid from ptlrpc request to avoid change the protocol. When I look into the bugs that Mahmoud Hanafi mentioned, I noticed that, ptlrpc_request::rq_pill is initialized in ptlrpc_service_ops::so_hpreq_handler. However not every request use so_hpreq_handler. mdt/mdt_mds.c: .so_hpreq_handler = ptlrpc_hpreq_handler, mdt/mdt_mds.c: .so_hpreq_handler = NULL, mdt/mdt_mds.c: .so_hpreq_handler = NULL, mdt/mdt_mds.c: .so_hpreq_handler = NULL, mdt/mdt_mds.c: .so_hpreq_handler = NULL, mdt/mdt_mds.c: .so_hpreq_handler = NULL, ost/ost_handler.c: .so_hpreq_handler = ptlrpc_hpreq_handler, ost/ost_handler.c: .so_hpreq_handler = tgt_hpreq_handler, ost/ost_handler.c: .so_hpreq_handler = NULL, ost/ost_handler.c: .so_hpreq_handler = NULL, That means, not every ptlrpc_request:rq_pill has been initialized before it is used in nrs_tbf.c. When the rq_pill is not initialized, using req_capsule_client_get() will trigger a LBUG. Ptlrpc requests from client to MDS , belongs to this case, So it caused a LBUG. I think adding the uid and gid to ptlrpc protocol can easily address this issue and it is a better way to keep the logic simpler and easier to understand. Although it will introduce some duplication field in the ptlrpc request, and break the protocol, I think it is worth doing so. . Our idea is as follows: 1. Add uid and gid to struct ptlrpc_body_v3 similar as jobid. 2. Combine the process logic of uid, gid, jobid together. 3. Handle uid and gid in nrs_tbf similar as jobid. I uploaded the patch at https://review.whamcloud.com/#/c/27608/ . And it need further improvement in coding to better merge the codes of jobid and uid/gid.

            People

              Teddy Teddy
              Teddy Teddy
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: