[LU-14724] Kernel crash when setting NRS TBF policy Created: 24/May/21 Updated: 27/Jun/22 Resolved: 17/Sep/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Li Xi | Assignee: | Qian Yingjin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
I was using a simple script to starting different types of TBF to check how the performance changes. But got following crash: [ 1227.167592] ------------[ cut here ]------------ [ 1227.169778] WARNING: CPU: 0 PID: 8312 at lib/list_debug.c:29 __list_add+0x65/0xc0 [ 1227.172968] list_add corruption. next->prev should be prev (ffff9a3576864b18), but was ffff9a3536545410. (next=ffff9a3535dfa818). [ 1227.177552] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops iosf_mbi crc32_pclmul drm ppdev ghash_clmulni_intel cryptd joydev pcspkr i2c_piix4 drm_panel_orientation_quirks virtio_balloon parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw virtio_pci virtio_ring virtio [ 1227.212638] CPU: 0 PID: 8312 Comm: lctl Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1 [ 1227.222441] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1227.226826] Call Trace: [ 1227.230063] [<ffffffff96365147>] dump_stack+0x19/0x1b [ 1227.233960] [<ffffffff95c98848>] __warn+0xd8/0x100 [ 1227.237733] [<ffffffff95c988cf>] warn_slowpath_fmt+0x5f/0x80 [ 1227.241797] [<ffffffff95dea874>] ? handle_pte_fault+0x2f4/0xd10 [ 1227.245918] [<ffffffff95f96ca5>] __list_add+0x65/0xc0 [ 1227.249896] [<ffffffffc091b2c3>] nrs_tbf_ctl+0x433/0x580 [ptlrpc] [ 1227.253730] [<ffffffffc0909af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc] [ 1227.257688] [<ffffffffc090c27f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc] [ 1227.261892] [<ffffffffc091a805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc] [ 1227.266550] [<ffffffff95e42700>] vfs_write+0xc0/0x1f0 [ 1227.270192] [<ffffffff95e4351f>] SyS_write+0x7f/0xf0 [ 1227.273826] [<ffffffff96377ddb>] system_call_fastpath+0x22/0x27 [ 1227.277613] ---[ end trace 7bf73ec94c8b9c7f ]--- [ 1227.280879] ------------[ cut here ]------------ [ 1227.284118] WARNING: CPU: 0 PID: 8312 at lib/list_debug.c:36 __list_add+0x8a/0xc0 [ 1227.288315] list_add double add: new=ffff9a3576864b18, prev=ffff9a3576864b18, next=ffff9a3535dfa818. [ 1227.292960] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops iosf_mbi crc32_pclmul drm ppdev ghash_clmulni_intel cryptd joydev pcspkr i2c_piix4 drm_panel_orientation_quirks virtio_balloon parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw virtio_pci virtio_ring virtio [ 1227.323006] CPU: 0 PID: 8312 Comm: lctl Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1 [ 1227.330409] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1227.333879] Call Trace: [ 1227.336297] [<ffffffff96365147>] dump_stack+0x19/0x1b [ 1227.339376] [<ffffffff95c98848>] __warn+0xd8/0x100 [ 1227.342292] [<ffffffff95c988cf>] warn_slowpath_fmt+0x5f/0x80 [ 1227.345366] [<ffffffff95dea874>] ? handle_pte_fault+0x2f4/0xd10 [ 1227.348480] [<ffffffff95f96cca>] __list_add+0x8a/0xc0 [ 1227.351379] [<ffffffffc091b2c3>] nrs_tbf_ctl+0x433/0x580 [ptlrpc] [ 1227.354469] [<ffffffffc0909af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc] [ 1227.357642] [<ffffffffc090c27f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc] [ 1227.361078] [<ffffffffc091a805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc] [ 1227.364632] [<ffffffff95e42700>] vfs_write+0xc0/0x1f0 [ 1227.367915] [<ffffffff95e4351f>] SyS_write+0x7f/0xf0 [ 1227.371199] [<ffffffff96377ddb>] system_call_fastpath+0x22/0x27 [ 1227.374733] ---[ end trace 7bf73ec94c8b9c80 ]--- [ 1227.377751] ------------[ cut here ]------------ [ 1227.380740] WARNING: CPU: 0 PID: 8312 at lib/list_debug.c:29 __list_add+0x65/0xc0 [ 1227.384566] list_add corruption. next->prev should be prev (ffff9a3576864a18), but was ffff9a357b3cb010. (next=ffff9a3535dfa518). [ 1227.391426] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops iosf_mbi crc32_pclmul drm ppdev ghash_clmulni_intel cryptd joydev pcspkr i2c_piix4 drm_panel_orientation_quirks virtio_balloon parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw virtio_pci virtio_ring virtio [ 1227.416905] CPU: 0 PID: 8312 Comm: lctl Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1 [ 1227.422865] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1227.425817] Call Trace: [ 1227.428091] [<ffffffff96365147>] dump_stack+0x19/0x1b [ 1227.430940] [<ffffffff95c98848>] __warn+0xd8/0x100 [ 1227.433708] [<ffffffff95c988cf>] warn_slowpath_fmt+0x5f/0x80 [ 1227.436657] [<ffffffff95dea874>] ? handle_pte_fault+0x2f4/0xd10 [ 1227.439639] [<ffffffff95f96ca5>] __list_add+0x65/0xc0 [ 1227.442494] [<ffffffffc091b2c3>] nrs_tbf_ctl+0x433/0x580 [ptlrpc] [ 1227.445569] [<ffffffffc0909af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc] [ 1227.448743] [<ffffffffc090c22d>] ptlrpc_nrs_policy_control+0xbd/0x2d0 [ptlrpc] [ 1227.452066] [<ffffffffc091a805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc] [ 1227.455594] [<ffffffff95e42700>] vfs_write+0xc0/0x1f0 [ 1227.458393] [<ffffffff95e4351f>] SyS_write+0x7f/0xf0 [ 1227.461229] [<ffffffff96377ddb>] system_call_fastpath+0x22/0x27 [ 1227.464742] ---[ end trace 7bf73ec94c8b9c81 ]--- [ 1227.467819] ------------[ cut here ]------------ [ 1227.470848] WARNING: CPU: 0 PID: 8312 at lib/list_debug.c:36 __list_add+0x8a/0xc0 [ 1227.474738] list_add double add: new=ffff9a3576864a18, prev=ffff9a3576864a18, next=ffff9a3535dfa518. [ 1227.479284] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops iosf_mbi crc32_pclmul drm ppdev ghash_clmulni_intel cryptd joydev pcspkr i2c_piix4 drm_panel_orientation_quirks virtio_balloon parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw virtio_pci virtio_ring virtio [ 1227.509176] CPU: 0 PID: 8312 Comm: lctl Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1 [ 1227.516423] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1227.519942] Call Trace: [ 1227.522420] [<ffffffff96365147>] dump_stack+0x19/0x1b [ 1227.525517] [<ffffffff95c98848>] __warn+0xd8/0x100 [ 1227.528534] [<ffffffff95c988cf>] warn_slowpath_fmt+0x5f/0x80 [ 1227.531673] [<ffffffff95dea874>] ? handle_pte_fault+0x2f4/0xd10 [ 1227.534916] [<ffffffff95f96cca>] __list_add+0x8a/0xc0 [ 1227.537959] [<ffffffffc091b2c3>] nrs_tbf_ctl+0x433/0x580 [ptlrpc] [ 1227.541268] [<ffffffffc0909af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc] [ 1227.544551] [<ffffffffc090c22d>] ptlrpc_nrs_policy_control+0xbd/0x2d0 [ptlrpc] [ 1227.547998] [<ffffffffc091a805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc] [ 1227.551691] [<ffffffff95e42700>] vfs_write+0xc0/0x1f0 [ 1227.554551] [<ffffffff95e4351f>] SyS_write+0x7f/0xf0 [ 1227.557286] [<ffffffff96377ddb>] system_call_fastpath+0x22/0x27 [ 1227.560224] ---[ end trace 7bf73ec94c8b9c82 ]--- [ 1257.728282] ------------[ cut here ]------------ [ 1257.732804] WARNING: CPU: 0 PID: 8496 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0 [ 1257.738365] list_del corruption. prev->next should be ffff9a3535dfa818, but was ffff9a3576864b18 [ 1257.743840] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops iosf_mbi crc32_pclmul drm ppdev ghash_clmulni_intel cryptd joydev pcspkr i2c_piix4 drm_panel_orientation_quirks virtio_balloon parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw virtio_pci virtio_ring virtio [ 1257.777458] CPU: 0 PID: 8496 Comm: lctl Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1 [ 1257.784986] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1257.788640] Call Trace: [ 1257.791381] [<ffffffff96365147>] dump_stack+0x19/0x1b [ 1257.794802] [<ffffffff95c98848>] __warn+0xd8/0x100 [ 1257.797962] [<ffffffff95c988cf>] warn_slowpath_fmt+0x5f/0x80 [ 1257.801135] [<ffffffff95f96da1>] __list_del_entry+0xa1/0xd0 [ 1257.804190] [<ffffffffc091b19e>] nrs_tbf_ctl+0x30e/0x580 [ptlrpc] [ 1257.807269] [<ffffffffc0909af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc] [ 1257.810451] [<ffffffffc090c27f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc] [ 1257.813840] [<ffffffffc091a805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc] [ 1257.817437] [<ffffffff95e42700>] vfs_write+0xc0/0x1f0 [ 1257.820276] [<ffffffff95e4351f>] SyS_write+0x7f/0xf0 [ 1257.823071] [<ffffffff96377ddb>] system_call_fastpath+0x22/0x27 [ 1257.826456] ---[ end trace 7bf73ec94c8b9c83 ]--- [ 1257.829223] ------------[ cut here ]------------ [ 1257.832312] WARNING: CPU: 0 PID: 8496 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0 [ 1257.836504] list_del corruption. prev->next should be ffff9a3535dfa518, but was ffff9a3576864a18 [ 1257.840931] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops iosf_mbi crc32_pclmul drm ppdev ghash_clmulni_intel cryptd joydev pcspkr i2c_piix4 drm_panel_orientation_quirks virtio_balloon parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw virtio_pci virtio_ring virtio [ 1257.869996] CPU: 0 PID: 8496 Comm: lctl Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1 [ 1257.877025] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1257.880482] Call Trace: [ 1257.882843] [<ffffffff96365147>] dump_stack+0x19/0x1b [ 1257.885960] [<ffffffff95c98848>] __warn+0xd8/0x100 [ 1257.888990] [<ffffffff95c988cf>] warn_slowpath_fmt+0x5f/0x80 [ 1257.892268] [<ffffffff95f96da1>] __list_del_entry+0xa1/0xd0 [ 1257.895832] [<ffffffffc091b19e>] nrs_tbf_ctl+0x30e/0x580 [ptlrpc] [ 1257.901397] [<ffffffffc0909af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc] [ 1257.906565] [<ffffffffc090c22d>] ptlrpc_nrs_policy_control+0xbd/0x2d0 [ptlrpc] [ 1257.911435] [<ffffffffc091a805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc] [ 1257.916663] [<ffffffff95e42700>] vfs_write+0xc0/0x1f0 [ 1257.920614] [<ffffffff95e4351f>] SyS_write+0x7f/0xf0 [ 1257.924491] [<ffffffff96377ddb>] system_call_fastpath+0x22/0x27 [ 1257.928491] ---[ end trace 7bf73ec94c8b9c84 ]--- [ 1259.206242] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 [ 1259.211189] IP: [<ffffffffc054c328>] cfs_match_nid+0x48/0xd0 [lnet] [ 1259.215625] PGD 0 [ 1259.218265] Oops: 0000 [#1] SMP [ 1259.221311] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops iosf_mbi crc32_pclmul drm ppdev ghash_clmulni_intel cryptd joydev pcspkr i2c_piix4 drm_panel_orientation_quirks virtio_balloon parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw virtio_pci virtio_ring virtio [ 1259.248355] CPU: 0 PID: 1413 Comm: mdt00_001 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1 [ 1259.256061] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1259.259755] task: ffff9a357a2cc100 ti: ffff9a3579e10000 task.ti: ffff9a3579e10000 [ 1259.264015] RIP: 0010:[<ffffffffc054c328>] [<ffffffffc054c328>] cfs_match_nid+0x48/0xd0 [lnet] [ 1259.268745] RSP: 0018:ffff9a3579e13c40 EFLAGS: 00010217 [ 1259.272250] RAX: ffffffffc029d970 RBX: ffff9a3535dfa528 RCX: 0000000000000000 [ 1259.276331] RDX: 0000000000000001 RSI: ffff9a3535dfa528 RDI: 000200000a0001dd [ 1259.280306] RBP: ffff9a3579e13c78 R08: 0000000000000002 R09: 0000000000000000 [ 1259.284372] R10: ffffffffc0659fe0 R11: ffff9a357b10f060 R12: 0000000000000002 [ 1259.288421] R13: ffff9a35768b8200 R14: ffff9a357b3cb020 R15: 0000000000000000 [ 1259.292487] FS: 0000000000000000(0000) GS:ffff9a357fc00000(0000) knlGS:0000000000000000 [ 1259.296743] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1259.300290] CR2: 0000000000000028 CR3: 00000000366c4000 CR4: 00000000001606f0 [ 1259.304279] Call Trace: [ 1259.306816] [<ffffffffc0912169>] nrs_tbf_nid_rule_match+0x19/0x20 [ptlrpc] [ 1259.310751] [<ffffffffc0911d7c>] nrs_tbf_rule_match+0x6c/0xd0 [ptlrpc] [ 1259.314523] [<ffffffffc09155a7>] nrs_tbf_res_get+0xb7/0x510 [ptlrpc] [ 1259.318206] [<ffffffffc0907992>] nrs_resource_get+0x82/0x100 [ptlrpc] [ 1259.322038] [<ffffffffc0907f10>] nrs_resource_get_safe+0x80/0xf0 [ptlrpc] [ 1259.325951] [<ffffffffc090ba73>] ptlrpc_nrs_req_initialize+0x83/0x100 [ptlrpc] [ 1259.329934] [<ffffffffc08d69e1>] ptlrpc_server_handle_req_in+0x751/0xd60 [ptlrpc] [ 1259.333991] [<ffffffffc08dab1d>] ptlrpc_main+0xaad/0x1460 [ptlrpc] [ 1259.337474] [<ffffffff95cd1ad0>] ? finish_task_switch+0x50/0x1c0 [ 1259.341009] [<ffffffffc08da070>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [ 1259.344963] [<ffffffff95cc2e81>] kthread+0xd1/0xe0 [ 1259.348089] [<ffffffff95cc2db0>] ? insert_kthread_work+0x40/0x40 [ 1259.351584] [<ffffffff96377c37>] ret_from_fork_nospec_begin+0x21/0x21 [ 1259.355185] [<ffffffff95cc2db0>] ? insert_kthread_work+0x40/0x40 [ 1259.358637] Code: c1 ec 10 0f b7 c0 48 89 f3 48 83 ec 10 4c 8b 3e 48 89 7d c8 89 45 d0 49 39 f7 75 0f eb 7d 0f 1f 44 00 00 4d 8b 3f 49 39 df 74 70 <49> 8b 47 28 44 39 20 75 ef 8b 55 d0 41 39 57 30 75 e6 41 8b 57 [ 1259.369540] RIP [<ffffffffc054c328>] cfs_match_nid+0x48/0xd0 [lnet] [ 1259.373156] RSP <ffff9a3579e13c40> [ 1259.375776] CR2: 0000000000000028 |
| Comments |
| Comment by Li Xi [ 24/May/21 ] |
|
Another crash on another MDT: [ 1565.214951] ------------[ cut here ]------------ [ 1565.221179] WARNING: CPU: 0 PID: 13300 at lib/list_debug.c:29 __list_add+0x65/0xc0 [ 1565.228134] list_add corruption. next->prev should be prev (ffff8e50398c0e18), but was ffff8e5039ff6110. (next=ffff8e503ab82f18). [ 1565.240741] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) mbcache jbd2 lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper iosf_mbi crc32_pclmul syscopyarea sysfillrect sysimgblt fb_sys_fops ppdev drm ghash_clmulni_intel joydev cryptd pcspkr virtio_balloon drm_panel_orientation_quirks i2c_piix4 parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net ata_piix crct10dif_pclmul crct10dif_common libata crc32c_intel serio_raw virtio_pci virtio_ring virtio [ 1565.281554] CPU: 0 PID: 13300 Comm: lctl Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1 [ 1565.290772] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1565.295060] Call Trace: [ 1565.298501] [<ffffffff95565147>] dump_stack+0x19/0x1b [ 1565.302598] [<ffffffff94e98848>] __warn+0xd8/0x100 [ 1565.306471] [<ffffffff94e988cf>] warn_slowpath_fmt+0x5f/0x80 [ 1565.310716] [<ffffffffc087724c>] ? ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x3ec/0x1030 [ptlrpc] [ 1565.315335] [<ffffffff95196ca5>] __list_add+0x65/0xc0 [ 1565.319064] [<ffffffffc08782c3>] nrs_tbf_ctl+0x433/0x580 [ptlrpc] [ 1565.323485] [<ffffffffc0866af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc] [ 1565.327993] [<ffffffffc086927f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc] [ 1565.332718] [<ffffffffc0877805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc] [ 1565.337725] [<ffffffff95042700>] vfs_write+0xc0/0x1f0 [ 1565.341479] [<ffffffff9504351f>] SyS_write+0x7f/0xf0 [ 1565.344740] [<ffffffff95577ddb>] system_call_fastpath+0x22/0x27 [ 1565.348476] ---[ end trace 5e55684a67664b6b ]--- [ 1565.352181] ------------[ cut here ]------------ [ 1565.355756] WARNING: CPU: 0 PID: 13300 at lib/list_debug.c:36 __list_add+0x8a/0xc0 [ 1565.360491] list_add double add: new=ffff8e50398c0e18, prev=ffff8e50398c0e18, next=ffff8e503ab82f18. [ 1565.365595] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) mbcache jbd2 lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper iosf_mbi crc32_pclmul syscopyarea sysfillrect sysimgblt fb_sys_fops ppdev drm ghash_clmulni_intel joydev cryptd pcspkr virtio_balloon drm_panel_orientation_quirks i2c_piix4 parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net ata_piix crct10dif_pclmul crct10dif_common libata crc32c_intel serio_raw virtio_pci virtio_ring virtio [ 1565.398289] CPU: 0 PID: 13300 Comm: lctl Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1 [ 1565.406004] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1565.409791] Call Trace: [ 1565.412503] [<ffffffff95565147>] dump_stack+0x19/0x1b [ 1565.416049] [<ffffffff94e98848>] __warn+0xd8/0x100 [ 1565.419307] [<ffffffff94e988cf>] warn_slowpath_fmt+0x5f/0x80 [ 1565.422814] [<ffffffffc087724c>] ? ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x3ec/0x1030 [ptlrpc] [ 1565.426988] [<ffffffff95196cca>] __list_add+0x8a/0xc0 [ 1565.430066] [<ffffffffc08782c3>] nrs_tbf_ctl+0x433/0x580 [ptlrpc] [ 1565.433358] [<ffffffffc0866af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc] [ 1565.436626] [<ffffffffc086927f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc] [ 1565.440073] [<ffffffffc0877805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc] [ 1565.443662] [<ffffffff95042700>] vfs_write+0xc0/0x1f0 [ 1565.446527] [<ffffffff9504351f>] SyS_write+0x7f/0xf0 [ 1565.449358] [<ffffffff95577ddb>] system_call_fastpath+0x22/0x27 [ 1565.452321] ---[ end trace 5e55684a67664b6c ]--- [ 1565.454939] ------------[ cut here ]------------ [ 1565.457505] WARNING: CPU: 0 PID: 13300 at lib/list_debug.c:29 __list_add+0x65/0xc0 [ 1565.460684] list_add corruption. next->prev should be prev (ffff8e50398c0818), but was ffff8e4ff65ec210. (next=ffff8e503ab82518). [ 1565.466212] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) mbcache jbd2 lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper iosf_mbi crc32_pclmul syscopyarea sysfillrect sysimgblt fb_sys_fops ppdev drm ghash_clmulni_intel joydev cryptd pcspkr virtio_balloon drm_panel_orientation_quirks i2c_piix4 parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net ata_piix crct10dif_pclmul crct10dif_common libata crc32c_intel serio_raw virtio_pci virtio_ring virtio [ 1565.493553] CPU: 0 PID: 13300 Comm: lctl Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1 [ 1565.501159] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1565.504808] Call Trace: [ 1565.507362] [<ffffffff95565147>] dump_stack+0x19/0x1b [ 1565.510777] [<ffffffff94e98848>] __warn+0xd8/0x100 [ 1565.514126] [<ffffffff94e988cf>] warn_slowpath_fmt+0x5f/0x80 [ 1565.517788] [<ffffffffc087724c>] ? ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x3ec/0x1030 [ptlrpc] [ 1565.522371] [<ffffffff95196ca5>] __list_add+0x65/0xc0 [ 1565.525777] [<ffffffffc08782c3>] nrs_tbf_ctl+0x433/0x580 [ptlrpc] [ 1565.529504] [<ffffffffc0866af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc] [ 1565.533196] [<ffffffffc086922d>] ptlrpc_nrs_policy_control+0xbd/0x2d0 [ptlrpc] [ 1565.537432] [<ffffffffc0877805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc] [ 1565.542160] [<ffffffff95042700>] vfs_write+0xc0/0x1f0 [ 1565.545567] [<ffffffff9504351f>] SyS_write+0x7f/0xf0 [ 1565.548887] [<ffffffff95577ddb>] system_call_fastpath+0x22/0x27 [ 1565.552486] ---[ end trace 5e55684a67664b6d ]--- [ 1565.556345] ------------[ cut here ]------------ [ 1565.559416] WARNING: CPU: 0 PID: 13300 at lib/list_debug.c:36 __list_add+0x8a/0xc0 [ 1565.563366] list_add double add: new=ffff8e50398c0818, prev=ffff8e50398c0818, next=ffff8e503ab82518. [ 1565.567794] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) mbcache jbd2 lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper iosf_mbi crc32_pclmul syscopyarea sysfillrect sysimgblt fb_sys_fops ppdev drm ghash_clmulni_intel joydev cryptd pcspkr virtio_balloon drm_panel_orientation_quirks i2c_piix4 parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net ata_piix crct10dif_pclmul crct10dif_common libata crc32c_intel serio_raw virtio_pci virtio_ring virtio [ 1565.594763] CPU: 0 PID: 13300 Comm: lctl Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1 [ 1565.600997] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1565.604002] Call Trace: [ 1565.606287] [<ffffffff95565147>] dump_stack+0x19/0x1b [ 1565.609156] [<ffffffff94e98848>] __warn+0xd8/0x100 [ 1565.611946] [<ffffffff94e988cf>] warn_slowpath_fmt+0x5f/0x80 [ 1565.614962] [<ffffffffc087724c>] ? ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x3ec/0x1030 [ptlrpc] [ 1565.618577] [<ffffffff95196cca>] __list_add+0x8a/0xc0 [ 1565.621495] [<ffffffffc08782c3>] nrs_tbf_ctl+0x433/0x580 [ptlrpc] [ 1565.624628] [<ffffffffc0866af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc] [ 1565.627787] [<ffffffffc086922d>] ptlrpc_nrs_policy_control+0xbd/0x2d0 [ptlrpc] [ 1565.631160] [<ffffffffc0877805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc] [ 1565.634742] [<ffffffff95042700>] vfs_write+0xc0/0x1f0 [ 1565.637570] [<ffffffff9504351f>] SyS_write+0x7f/0xf0 [ 1565.640359] [<ffffffff95577ddb>] system_call_fastpath+0x22/0x27 [ 1565.643294] ---[ end trace 5e55684a67664b6e ]--- [ 1595.581246] ------------[ cut here ]------------ [ 1595.586173] WARNING: CPU: 0 PID: 13443 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0 [ 1595.592200] list_del corruption. prev->next should be ffff8e503ab82f18, but was ffff8e50398c0e18 [ 1595.598089] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) mbcache jbd2 lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper iosf_mbi crc32_pclmul syscopyarea sysfillrect sysimgblt fb_sys_fops ppdev drm ghash_clmulni_intel joydev cryptd pcspkr virtio_balloon drm_panel_orientation_quirks i2c_piix4 parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net ata_piix crct10dif_pclmul crct10dif_common libata crc32c_intel serio_raw virtio_pci virtio_ring virtio [ 1595.636079] CPU: 0 PID: 13443 Comm: lctl Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1 [ 1595.644034] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1595.647910] Call Trace: [ 1595.650830] [<ffffffff95565147>] dump_stack+0x19/0x1b [ 1595.654445] [<ffffffff94e98848>] __warn+0xd8/0x100 [ 1595.657941] [<ffffffff94e988cf>] warn_slowpath_fmt+0x5f/0x80 [ 1595.661557] [<ffffffff95196da1>] __list_del_entry+0xa1/0xd0 [ 1595.665201] [<ffffffffc087819e>] nrs_tbf_ctl+0x30e/0x580 [ptlrpc] [ 1595.668992] [<ffffffffc0866af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc] [ 1595.673091] [<ffffffffc086927f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc] [ 1595.677170] [<ffffffffc0877805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc] [ 1595.681520] [<ffffffff95042700>] vfs_write+0xc0/0x1f0 [ 1595.685171] [<ffffffff9504351f>] SyS_write+0x7f/0xf0 [ 1595.688739] [<ffffffff95577ddb>] system_call_fastpath+0x22/0x27 [ 1595.692521] ---[ end trace 5e55684a67664b6f ]--- [ 1595.695824] ------------[ cut here ]------------ [ 1595.698981] WARNING: CPU: 0 PID: 13443 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0 [ 1595.703309] list_del corruption. prev->next should be ffff8e503ab82518, but was ffff8e50398c0818 [ 1595.707759] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) mbcache jbd2 lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper iosf_mbi crc32_pclmul syscopyarea sysfillrect sysimgblt fb_sys_fops ppdev drm ghash_clmulni_intel joydev cryptd pcspkr virtio_balloon drm_panel_orientation_quirks i2c_piix4 parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net ata_piix crct10dif_pclmul crct10dif_common libata crc32c_intel serio_raw virtio_pci virtio_ring virtio [ 1595.738345] CPU: 0 PID: 13443 Comm: lctl Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1 [ 1595.745448] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1595.748932] Call Trace: [ 1595.751361] [<ffffffff95565147>] dump_stack+0x19/0x1b [ 1595.754438] [<ffffffff94e98848>] __warn+0xd8/0x100 [ 1595.757350] [<ffffffff94e988cf>] warn_slowpath_fmt+0x5f/0x80 [ 1595.760570] [<ffffffff95196da1>] __list_del_entry+0xa1/0xd0 [ 1595.763697] [<ffffffffc087819e>] nrs_tbf_ctl+0x30e/0x580 [ptlrpc] [ 1595.767014] [<ffffffffc0866af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc] [ 1595.770351] [<ffffffffc086922d>] ptlrpc_nrs_policy_control+0xbd/0x2d0 [ptlrpc] [ 1595.773820] [<ffffffffc0877805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc] [ 1595.777472] [<ffffffff95042700>] vfs_write+0xc0/0x1f0 [ 1595.780332] [<ffffffff9504351f>] SyS_write+0x7f/0xf0 [ 1595.783133] [<ffffffff95577ddb>] system_call_fastpath+0x22/0x27 [ 1595.786068] ---[ end trace 5e55684a67664b70 ]--- [ 1596.990479] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 [ 1596.995855] IP: [<ffffffffc0457328>] cfs_match_nid+0x48/0xd0 [lnet] [ 1597.000313] PGD 8000000035084067 PUD 79bd8067 PMD 0 [ 1597.004324] Oops: 0000 [#1] SMP [ 1597.007632] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) mbcache jbd2 lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper iosf_mbi crc32_pclmul syscopyarea sysfillrect sysimgblt fb_sys_fops ppdev drm ghash_clmulni_intel joydev cryptd pcspkr virtio_balloon drm_panel_orientation_quirks i2c_piix4 parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net ata_piix crct10dif_pclmul crct10dif_common libata crc32c_intel serio_raw virtio_pci virtio_ring virtio [ 1597.041311] CPU: 0 PID: 7673 Comm: mdt00_001 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1 [ 1597.049108] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 1597.052462] task: ffff8e5037f71040 ti: ffff8e501c22c000 task.ti: ffff8e501c22c000 [ 1597.056292] RIP: 0010:[<ffffffffc0457328>] [<ffffffffc0457328>] cfs_match_nid+0x48/0xd0 [lnet] [ 1597.060387] RSP: 0018:ffff8e501c22fc40 EFLAGS: 00010217 [ 1597.064005] RAX: ffffffff9541b420 RBX: ffff8e503ab82528 RCX: 0000000000000000 [ 1597.068236] RDX: 0000000000000001 RSI: ffff8e503ab82528 RDI: 000200000a0001eb [ 1597.072364] RBP: ffff8e501c22fc78 R08: 0000000000000002 R09: 0000000000000000 [ 1597.076485] R10: ffffffffc0564fe0 R11: ffff8e501b163240 R12: 0000000000000002 [ 1597.080530] R13: ffff8e503bda6c00 R14: ffff8e4ff65ec220 R15: 0000000000000000 [ 1597.084545] FS: 0000000000000000(0000) GS:ffff8e503fc00000(0000) knlGS:0000000000000000 [ 1597.088822] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1597.092386] CR2: 0000000000000028 CR3: 000000007be48000 CR4: 00000000001606f0 [ 1597.096769] Call Trace: [ 1597.099744] [<ffffffffc086f169>] nrs_tbf_nid_rule_match+0x19/0x20 [ptlrpc] [ 1597.103989] [<ffffffffc086ed7c>] nrs_tbf_rule_match+0x6c/0xd0 [ptlrpc] [ 1597.108283] [<ffffffffc08725a7>] nrs_tbf_res_get+0xb7/0x510 [ptlrpc] [ 1597.112474] [<ffffffffc0864992>] nrs_resource_get+0x82/0x100 [ptlrpc] [ 1597.116594] [<ffffffffc0864f10>] nrs_resource_get_safe+0x80/0xf0 [ptlrpc] [ 1597.120613] [<ffffffffc0868a73>] ptlrpc_nrs_req_initialize+0x83/0x100 [ptlrpc] [ 1597.124721] [<ffffffffc08339e1>] ptlrpc_server_handle_req_in+0x751/0xd60 [ptlrpc] [ 1597.128740] [<ffffffffc0837b1d>] ptlrpc_main+0xaad/0x1460 [ptlrpc] [ 1597.132249] [<ffffffff94ed1ad0>] ? finish_task_switch+0x50/0x1c0 [ 1597.135701] [<ffffffffc0837070>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [ 1597.139454] [<ffffffff94ec2e81>] kthread+0xd1/0xe0 [ 1597.142406] [<ffffffff94ec2db0>] ? insert_kthread_work+0x40/0x40 [ 1597.145706] [<ffffffff95577c37>] ret_from_fork_nospec_begin+0x21/0x21 [ 1597.148740] [<ffffffff94ec2db0>] ? insert_kthread_work+0x40/0x40 [ 1597.151607] Code: c1 ec 10 0f b7 c0 48 89 f3 48 83 ec 10 4c 8b 3e 48 89 7d c8 89 45 d0 49 39 f7 75 0f eb 7d 0f 1f 44 00 00 4d 8b 3f 49 39 df 74 70 <49> 8b 47 28 44 39 20 75 ef 8b 55 d0 41 39 57 30 75 e6 41 8b 57 [ 1597.160777] RIP [<ffffffffc0457328>] cfs_match_nid+0x48/0xd0 [lnet] [ 1597.163897] RSP <ffff8e501c22fc40> [ 1597.166228] CR2: 0000000000000028 |
| Comment by Li Xi [ 24/May/21 ] |
|
The process of trigger the problem is: 1. change the NRS policy to "tbf", and create two rules of "opcode={ost_write},nid={192.168.*.*@tcp}"
2. change the rank of a rule "gid={500}"
, repeat 2, 3, 4 "uid={500}"
, repeat 2, 3, 4 "uid={dd.*}"
, repeat 2, 3, 4 "nid={192.168.*.*@tcp}"
, repeat 2, 3, 4 I am not sure whether the process can be reduce to simpler steps. But it seems quite easy to reproduce. |
| Comment by Li Xi [ 25/May/21 ] |
|
Steps to reproduce: lctl set_param mds.MDS.mdt.nrs_policies="tbf nid";
lctl set_param mds.MDS.mdt.nrs_tbf_rule="start name nid={192.168.*.*@tcp} rate=10000";
lctl set_param mds.MDS.mdt.nrs_tbf_rule="start name1 nid={192.168.*.*@tcp} rate=10000";
lctl set_param mds.MDS.mdt.nrs_tbf_rule="change name1 rank=name";
lctl set_param mds.MDS.mdt.nrs_tbf_rule="stop name";
|
| Comment by Gerrit Updater [ 04/Jun/21 ] |
|
Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/43925 |
| Comment by Gerrit Updater [ 17/Sep/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/43925/ |
| Comment by Peter Jones [ 17/Sep/21 ] |
|
Landed for 2.15 |
| Comment by Gerrit Updater [ 07/Jan/22 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/46001 |