Details
-
New Feature
-
Resolution: Fixed
-
Minor
-
None
-
None
-
9223372036854775807
Description
Limit the rate of ptlrpc requests by uid or gid. By this way, the TBF policy is able to provide QoS control to a user or user group identified by uid or gid.
.
Attachments
Issue Links
Activity
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27608/
Subject: LU-9658 ptlrpc: Add QoS for uid and gid in NRS-TBF
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e0cdde123c14729a340cf937cf9580be7c9dd9c1
With the latest mater branch, it panic:
[ 3461.027650] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 [ 3461.027653] IP: [<ffffffffa0b0af84>] nrs_tbf_ctl+0x194/0x590 [ptlrpc] [ 3461.027694] PGD 4faa8067 PUD 3d2c1067 PMD 0 [ 3461.027696] Oops: 0000 [#1] SMP [ 3461.027697] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) loop fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bnep vmw_vsock_vmci_transport vsock dm_mirror dm_region_hash dm_log dm_mod snd_seq_midi snd_seq_midi_event intel_powerclamp iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel snd_ens1371 lrw gf128mul glue_helper ablk_helper cryptd snd_rawmidi [ 3461.027713] snd_ac97_codec ppdev pcspkr vmw_balloon ac97_bus snd_seq snd_seq_device snd_pcm btusb btrtl btbcm btintel bluetooth snd_timer snd rfkill soundcore nfit sg libnvdimm parport_pc parport shpchp vmw_vmci i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel serio_raw ata_piix vmwgfx e1000 mptspi scsi_transport_spi mptscsih mptbase drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm i2c_core libata fjes [last unloaded: libcfs] [ 3461.027730] CPU: 0 PID: 8846 Comm: lt-lctl Tainted: G OE ------------ 3.10.0-514.21.1.el7_lustre.2.9.59_35_gc1d70a4.x86_64 #1 [ 3461.027731] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 [ 3461.027732] task: ffff880060cfedd0 ti: ffff88004fa28000 task.ti: ffff88004fa28000 [ 3461.027733] RIP: 0010:[<ffffffffa0b0af84>] [<ffffffffa0b0af84>] nrs_tbf_ctl+0x194/0x590 [ptlrpc] [ 3461.027756] RSP: 0018:ffff88004fa2bda8 EFLAGS: 00010293 [ 3461.027757] RAX: 0000000000160014 RBX: 0000000000000021 RCX: 0000000000000016 [ 3461.027757] RDX: 0000000000000001 RSI: 0000000000000021 RDI: ffff880015acdce0 [ 3461.027758] RBP: ffff88004fa2bdd0 R08: 0000000000000000 R09: ffff880015acdc00 [ 3461.027758] R10: ffff88007b619ae0 R11: ffffea00013ea400 R12: 0000000000000001 [ 3461.027759] R13: ffff8800158ed380 R14: ffff8800157c7800 R15: 0000000000000000 [ 3461.027760] FS: 00007fb6bcfec740(0000) GS:ffff88007b600000(0000) knlGS:0000000000000000 [ 3461.027760] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3461.027761] CR2: 0000000000000001 CR3: 000000003f0fe000 CR4: 00000000003407f0 [ 3461.027777] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3461.027785] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3461.027786] Stack: [ 3461.027786] ffff8800158ed380 ffff880015acdce0 0000000000000021 0000000000000001 [ 3461.027787] 0000000000000000 ffff88004fa2be00 ffffffffa0afaa15 0000000000000001 [ 3461.027788] 0000000000000002 0000000000000021 ffff88004bfe9600 ffff88004fa2be58 [ 3461.027789] Call Trace: [ 3461.027810] [<ffffffffa0afaa15>] nrs_policy_ctl+0x115/0x2b0 [ptlrpc] [ 3461.027837] [<ffffffffa0afd10f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc] [ 3461.027857] [<ffffffffa0b0bd15>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x995/0x1020 [ptlrpc] [ 3461.027860] [<ffffffff812012e8>] ? __sb_start_write+0x58/0x110 [ 3461.027862] [<ffffffff8126bdbd>] proc_reg_write+0x3d/0x80 [ 3461.027863] [<ffffffff811fe82d>] vfs_write+0xbd/0x1e0 [ 3461.027865] [<ffffffff8120f24d>] ? putname+0x3d/0x60 [ 3461.027866] [<ffffffff811ff34f>] SyS_write+0x7f/0xe0 [ 3461.027867] [<ffffffff81697849>] system_call_fastpath+0x16/0x1b [ 3461.027868] Code: 83 57 bd ff 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 84 00 00 00 00 00 8b 07 4d 8b 75 58 89 c1 c1 e9 10 66 39 c8 0f 84 f0 03 00 00 <41> 8b 04 24 83 f8 01 0f 84 1f 01 00 00 0f 82 b9 01 00 00 83 f8 [ 3461.027878] RIP [<ffffffffa0b0af84>] nrs_tbf_ctl+0x194/0x590 [ptlrpc] [ 3461.027897] RSP <ffff88004fa2bda8> [ 3461.027898] CR2: 0000000000000001
Hi, Andreas. I analyzed the ptlrpc codes and its processing logic on server side(both mdt and ost) these days. Initializing rq_pill in requests is a feasible solution, although doing this is tiring. I have done the related codes without breaking the original ptlrpc handling logic, and I have uploaded the patch.
In TBF, I think, we do not need limit every request. Only limiting the requests that have great influence on performance is enough (such as read, write, getattr, setattr etc). For others, (such connect, statfs, quotactl, ping), we might just let them go. Besides, if we need to control every request from client, I think, modifying the request protocol is inevitable and is also favorable in the long run.
Changing the protocol means that this feature can likely not land until 2.12, and will not be easily backported to 2.10 or earlier releases.
I agree that re-using the existing uid/gid fields does add some complexity, but allows this feature to work with existing clients and servers much more easily.
Why not initialize rq_pill in all cases to avoid this problem?
Hi, Peter, in my former patches, I was trying to unpack the uid and gid from ptlrpc request to avoid change the protocol. When I look into the bugs that Mahmoud Hanafi mentioned, I noticed that, ptlrpc_request::rq_pill is initialized in ptlrpc_service_ops::so_hpreq_handler. However not every request use so_hpreq_handler.
mdt/mdt_mds.c: .so_hpreq_handler = ptlrpc_hpreq_handler,
mdt/mdt_mds.c: .so_hpreq_handler = NULL,
mdt/mdt_mds.c: .so_hpreq_handler = NULL,
mdt/mdt_mds.c: .so_hpreq_handler = NULL,
mdt/mdt_mds.c: .so_hpreq_handler = NULL,
mdt/mdt_mds.c: .so_hpreq_handler = NULL,ost/ost_handler.c: .so_hpreq_handler = ptlrpc_hpreq_handler,
ost/ost_handler.c: .so_hpreq_handler = tgt_hpreq_handler,
ost/ost_handler.c: .so_hpreq_handler = NULL,
ost/ost_handler.c: .so_hpreq_handler = NULL,
That means, not every ptlrpc_request:rq_pill has been initialized before it is used in nrs_tbf.c. When the rq_pill is not initialized, using req_capsule_client_get() will trigger a LBUG. Ptlrpc requests from client to MDS, belongs to this case, So it caused a LBUG.
I think adding the uid and gid to ptlrpc protocol can easily address this issue and it is a better way to keep the logic simpler and easier to understand. Although it will introduce some duplication field in the ptlrpc request, and break the protocol, I think it is worth doing so.
. Our idea is as follows:
1. Add uid and gid to struct ptlrpc_body_v3 similar as jobid.
2. Combine the process logic of uid, gid, jobid together.
3. Handle uid and gid in nrs_tbf similar as jobid.
I uploaded the patch at https://review.whamcloud.com/#/c/27608/. And it need further improvement in coding to better merge the codes of jobid and uid/gid.
We have started to test this patch at nasa.
setting up simple case
lctl set_param mds.MDS.mdt.nrs_policies="tbf uid" lctl set_param mds.MDS.mdt.nrs_tbf_rule="start ext_ug uid=\{11312} rate=100" nbp7-mds2 /proc/fs/lustre/mds/MDS/mdt # cat nrs_tbf_rule regular_requests: CPT 0: ext_ug \{11312} 100, ref 0 default \{*} 10000, ref 0 CPT 1: ext_ug \{11312} 100, ref 0 default \{*} 10000, ref 0 high_priority_requests: CPT 0: ext_ug \{11312} 100, ref 0 default \{*} 10000, ref 1 CPT 1: ext_ug \{11312} 100, ref 0 default \{*} 10000, ref 1
Just cd into mount directory get a crash
[ 456.579558] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 [ 456.588322] IP: [<ffffffffa0f83132>] nrs_tbf_id_cli_set+0x52/0xa0 [ptlrpc] [ 456.596114] PGD 811980067 PUD 800580067 PMD 0 [ 456.601103] Oops: 0000 [#1] SMP Stack traceback for pid 3081 0xffff880762acbf40 3081 2 1 3 R 0xffff880762acc5d8 *mdt01_000 ffff8800a72cfcb8 0000000000000018 ffffffffa0f83100 ffff8810124a1400 ffff880f7fbfc100 ffff880f7fbfc100 ffff8800a72cfd00 ffffffffa0f831f7 00000010a72cfdb0 0000000000000000 00000000fcdbf389 ffff8810124a1400 Call Trace: [<ffffffffa0f83100>] ? nrs_tbf_id_cli_set+0x20/0xa0 [ptlrpc] [<ffffffffa0f831f7>] ? nrs_tbf_id_cli_find+0x37/0xa0 [ptlrpc] [<ffffffffa0f85f83>] ? nrs_tbf_res_get+0x43/0x4c0 [ptlrpc] [<ffffffffa0f7839c>] ? nrs_resource_get+0x7c/0x100 [ptlrpc] [<ffffffffa0f78910>] ? nrs_resource_get_safe+0x80/0xf0 [ptlrpc] [<ffffffffa0f7c3e3>] ? ptlrpc_nrs_req_initialize+0x83/0x100 [ptlrpc] [<ffffffffa0f4b211>] ? ptlrpc_main+0x1771/0x1e40 [ptlrpc] [<ffffffff81029557>] ? __switch_to+0xd7/0x4c0 [<ffffffff81689b00>] ? __schedule+0x2e0/0x8a0 [<ffffffffa0f49aa0>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc] [<ffffffff810ad9ef>] ? kthread+0xcf/0xe0 [<ffffffff810ad920>] ? insert_kthread_work+0x40/0x40 [<ffffffff81695ad8>] ? ret_from_fork+0x58/0x90 [<ffffffff810ad920>] ? insert_kthread_work+0x40/0x40 crash> bt -t PID: 13793 TASK: ffff8807fb598fd0 CPU: 9 COMMAND: "mdt01_001" START: machine_kexec at ffffffff8105a3ab [ffff880780ddba18] machine_kexec at ffffffff8105a3ab [ffff880780ddba78] __crash_kexec at ffffffff811015b2 [ffff880780ddbb00] __crash_kexec at ffffffff8110164f [ffff880780ddbb48] panic at ffffffff8167f10d [ffff880780ddbbc8] lbug_with_loc at ffffffffa0913854 [libcfs] [ffff880780ddbbe8] __req_capsule_get at ffffffffa0c8ecc8 [ptlrpc] [ffff880780ddbc68] req_capsule_client_get at ffffffffa0c8ed75 [ptlrpc] [ffff880780ddbc78] unpack_ugid_from_mdt_body at ffffffffa0cb0f60 [ptlrpc] [ffff880780ddbc90] mdt_tbf_id_cli_set at ffffffffa0cb10bb [ptlrpc] [ffff880780ddbcb0] nrs_tbf_id_cli_set at ffffffffa0cb1163 [ptlrpc] [ffff880780ddbcd8] nrs_tbf_id_cli_find at ffffffffa0cb11f7 [ptlrpc] [ffff880780ddbd08] nrs_tbf_res_get at ffffffffa0cb3f83 [ptlrpc] [ffff880780ddbd50] nrs_resource_get at ffffffffa0ca639c [ptlrpc] [ffff880780ddbd98] nrs_resource_get_safe at ffffffffa0ca6910 [ptlrpc] [ffff880780ddbdd0] ptlrpc_nrs_req_initialize at ffffffffa0caa3e3 [ptlrpc] [ffff880780ddbde8] ptlrpc_main at ffffffffa0c79211 [ptlrpc] [ffff880780ddbdf8] __switch_to at ffffffff81029557 [ffff880780ddbe58] __schedule at ffffffff81689b00 [ffff880780ddbea8] ptlrpc_main at ffffffffa0c77aa0 [ptlrpc] [ffff880780ddbec8] kthread at ffffffff810ad9ef [ffff880780ddbf30] kthread at ffffffff810ad920 [ffff880780ddbf50] ret_from_fork at ffffffff81695ad8 [ffff880780ddbf80] kthread at ffffffff810ad920
Anonymous Coward (jjkky@yahoo.com) uploaded a new patch: https://review.whamcloud.com/27608
Subject: LU-9658 ptlrpc: Add QoS for uid and gid in NRS-TBF
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f7c2c29d69ec22095c721888bd54b7b860d5c19f
If I am going to apply this patch to b2_10 (2.10.3), is there any other patch I need to pick up also for this patch to work correctly? (This patch was applied to b2_10 cleanly.)