Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14724

Kernel crash when setting NRS TBF policy

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Lustre 2.15.0
    • Labels:
      None
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      I was using a simple script to starting different types of TBF to check how the performance changes. But got following crash:

      [ 1227.167592] ------------[ cut here ]------------
      [ 1227.169778] WARNING: CPU: 0 PID: 8312 at lib/list_debug.c:29 __list_add+0x65/0xc0
      [ 1227.172968] list_add corruption. next->prev should be prev (ffff9a3576864b18), but was ffff9a3536545410. (next=ffff9a3535dfa818).
      [ 1227.177552] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops iosf_mbi crc32_pclmul drm ppdev ghash_clmulni_intel cryptd joydev pcspkr i2c_piix4 drm_panel_orientation_quirks virtio_balloon parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw virtio_pci virtio_ring virtio
      [ 1227.212638] CPU: 0 PID: 8312 Comm: lctl Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1
      [ 1227.222441] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 1227.226826] Call Trace:
      [ 1227.230063]  [<ffffffff96365147>] dump_stack+0x19/0x1b
      [ 1227.233960]  [<ffffffff95c98848>] __warn+0xd8/0x100
      [ 1227.237733]  [<ffffffff95c988cf>] warn_slowpath_fmt+0x5f/0x80
      [ 1227.241797]  [<ffffffff95dea874>] ? handle_pte_fault+0x2f4/0xd10
      [ 1227.245918]  [<ffffffff95f96ca5>] __list_add+0x65/0xc0
      [ 1227.249896]  [<ffffffffc091b2c3>] nrs_tbf_ctl+0x433/0x580 [ptlrpc]
      [ 1227.253730]  [<ffffffffc0909af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc]
      [ 1227.257688]  [<ffffffffc090c27f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
      [ 1227.261892]  [<ffffffffc091a805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc]
      [ 1227.266550]  [<ffffffff95e42700>] vfs_write+0xc0/0x1f0
      [ 1227.270192]  [<ffffffff95e4351f>] SyS_write+0x7f/0xf0
      [ 1227.273826]  [<ffffffff96377ddb>] system_call_fastpath+0x22/0x27
      [ 1227.277613] ---[ end trace 7bf73ec94c8b9c7f ]---
      [ 1227.280879] ------------[ cut here ]------------
      [ 1227.284118] WARNING: CPU: 0 PID: 8312 at lib/list_debug.c:36 __list_add+0x8a/0xc0
      [ 1227.288315] list_add double add: new=ffff9a3576864b18, prev=ffff9a3576864b18, next=ffff9a3535dfa818.
      [ 1227.292960] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops iosf_mbi crc32_pclmul drm ppdev ghash_clmulni_intel cryptd joydev pcspkr i2c_piix4 drm_panel_orientation_quirks virtio_balloon parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw virtio_pci virtio_ring virtio
      [ 1227.323006] CPU: 0 PID: 8312 Comm: lctl Kdump: loaded Tainted: G        W  OE  ------------   3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1
      [ 1227.330409] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 1227.333879] Call Trace:
      [ 1227.336297]  [<ffffffff96365147>] dump_stack+0x19/0x1b
      [ 1227.339376]  [<ffffffff95c98848>] __warn+0xd8/0x100
      [ 1227.342292]  [<ffffffff95c988cf>] warn_slowpath_fmt+0x5f/0x80
      [ 1227.345366]  [<ffffffff95dea874>] ? handle_pte_fault+0x2f4/0xd10
      [ 1227.348480]  [<ffffffff95f96cca>] __list_add+0x8a/0xc0
      [ 1227.351379]  [<ffffffffc091b2c3>] nrs_tbf_ctl+0x433/0x580 [ptlrpc]
      [ 1227.354469]  [<ffffffffc0909af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc]
      [ 1227.357642]  [<ffffffffc090c27f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
      [ 1227.361078]  [<ffffffffc091a805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc]
      [ 1227.364632]  [<ffffffff95e42700>] vfs_write+0xc0/0x1f0
      [ 1227.367915]  [<ffffffff95e4351f>] SyS_write+0x7f/0xf0
      [ 1227.371199]  [<ffffffff96377ddb>] system_call_fastpath+0x22/0x27
      [ 1227.374733] ---[ end trace 7bf73ec94c8b9c80 ]---
      [ 1227.377751] ------------[ cut here ]------------
      [ 1227.380740] WARNING: CPU: 0 PID: 8312 at lib/list_debug.c:29 __list_add+0x65/0xc0
      [ 1227.384566] list_add corruption. next->prev should be prev (ffff9a3576864a18), but was ffff9a357b3cb010. (next=ffff9a3535dfa518).
      [ 1227.391426] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops iosf_mbi crc32_pclmul drm ppdev ghash_clmulni_intel cryptd joydev pcspkr i2c_piix4 drm_panel_orientation_quirks virtio_balloon parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw virtio_pci virtio_ring virtio
      [ 1227.416905] CPU: 0 PID: 8312 Comm: lctl Kdump: loaded Tainted: G        W  OE  ------------   3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1
      [ 1227.422865] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 1227.425817] Call Trace:
      [ 1227.428091]  [<ffffffff96365147>] dump_stack+0x19/0x1b
      [ 1227.430940]  [<ffffffff95c98848>] __warn+0xd8/0x100
      [ 1227.433708]  [<ffffffff95c988cf>] warn_slowpath_fmt+0x5f/0x80
      [ 1227.436657]  [<ffffffff95dea874>] ? handle_pte_fault+0x2f4/0xd10
      [ 1227.439639]  [<ffffffff95f96ca5>] __list_add+0x65/0xc0
      [ 1227.442494]  [<ffffffffc091b2c3>] nrs_tbf_ctl+0x433/0x580 [ptlrpc]
      [ 1227.445569]  [<ffffffffc0909af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc]
      [ 1227.448743]  [<ffffffffc090c22d>] ptlrpc_nrs_policy_control+0xbd/0x2d0 [ptlrpc]
      [ 1227.452066]  [<ffffffffc091a805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc]
      [ 1227.455594]  [<ffffffff95e42700>] vfs_write+0xc0/0x1f0
      [ 1227.458393]  [<ffffffff95e4351f>] SyS_write+0x7f/0xf0
      [ 1227.461229]  [<ffffffff96377ddb>] system_call_fastpath+0x22/0x27
      [ 1227.464742] ---[ end trace 7bf73ec94c8b9c81 ]---
      [ 1227.467819] ------------[ cut here ]------------
      [ 1227.470848] WARNING: CPU: 0 PID: 8312 at lib/list_debug.c:36 __list_add+0x8a/0xc0
      [ 1227.474738] list_add double add: new=ffff9a3576864a18, prev=ffff9a3576864a18, next=ffff9a3535dfa518.
      [ 1227.479284] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops iosf_mbi crc32_pclmul drm ppdev ghash_clmulni_intel cryptd joydev pcspkr i2c_piix4 drm_panel_orientation_quirks virtio_balloon parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw virtio_pci virtio_ring virtio
      [ 1227.509176] CPU: 0 PID: 8312 Comm: lctl Kdump: loaded Tainted: G        W  OE  ------------   3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1
      [ 1227.516423] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 1227.519942] Call Trace:
      [ 1227.522420]  [<ffffffff96365147>] dump_stack+0x19/0x1b
      [ 1227.525517]  [<ffffffff95c98848>] __warn+0xd8/0x100
      [ 1227.528534]  [<ffffffff95c988cf>] warn_slowpath_fmt+0x5f/0x80
      [ 1227.531673]  [<ffffffff95dea874>] ? handle_pte_fault+0x2f4/0xd10
      [ 1227.534916]  [<ffffffff95f96cca>] __list_add+0x8a/0xc0
      [ 1227.537959]  [<ffffffffc091b2c3>] nrs_tbf_ctl+0x433/0x580 [ptlrpc]
      [ 1227.541268]  [<ffffffffc0909af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc]
      [ 1227.544551]  [<ffffffffc090c22d>] ptlrpc_nrs_policy_control+0xbd/0x2d0 [ptlrpc]
      [ 1227.547998]  [<ffffffffc091a805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc]
      [ 1227.551691]  [<ffffffff95e42700>] vfs_write+0xc0/0x1f0
      [ 1227.554551]  [<ffffffff95e4351f>] SyS_write+0x7f/0xf0
      [ 1227.557286]  [<ffffffff96377ddb>] system_call_fastpath+0x22/0x27
      [ 1227.560224] ---[ end trace 7bf73ec94c8b9c82 ]---
      [ 1257.728282] ------------[ cut here ]------------
      [ 1257.732804] WARNING: CPU: 0 PID: 8496 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
      [ 1257.738365] list_del corruption. prev->next should be ffff9a3535dfa818, but was ffff9a3576864b18
      [ 1257.743840] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops iosf_mbi crc32_pclmul drm ppdev ghash_clmulni_intel cryptd joydev pcspkr i2c_piix4 drm_panel_orientation_quirks virtio_balloon parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw virtio_pci virtio_ring virtio
      [ 1257.777458] CPU: 0 PID: 8496 Comm: lctl Kdump: loaded Tainted: G        W  OE  ------------   3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1
      [ 1257.784986] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 1257.788640] Call Trace:
      [ 1257.791381]  [<ffffffff96365147>] dump_stack+0x19/0x1b
      [ 1257.794802]  [<ffffffff95c98848>] __warn+0xd8/0x100
      [ 1257.797962]  [<ffffffff95c988cf>] warn_slowpath_fmt+0x5f/0x80
      [ 1257.801135]  [<ffffffff95f96da1>] __list_del_entry+0xa1/0xd0
      [ 1257.804190]  [<ffffffffc091b19e>] nrs_tbf_ctl+0x30e/0x580 [ptlrpc]
      [ 1257.807269]  [<ffffffffc0909af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc]
      [ 1257.810451]  [<ffffffffc090c27f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
      [ 1257.813840]  [<ffffffffc091a805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc]
      [ 1257.817437]  [<ffffffff95e42700>] vfs_write+0xc0/0x1f0
      [ 1257.820276]  [<ffffffff95e4351f>] SyS_write+0x7f/0xf0
      [ 1257.823071]  [<ffffffff96377ddb>] system_call_fastpath+0x22/0x27
      [ 1257.826456] ---[ end trace 7bf73ec94c8b9c83 ]---
      [ 1257.829223] ------------[ cut here ]------------
      [ 1257.832312] WARNING: CPU: 0 PID: 8496 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
      [ 1257.836504] list_del corruption. prev->next should be ffff9a3535dfa518, but was ffff9a3576864a18
      [ 1257.840931] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops iosf_mbi crc32_pclmul drm ppdev ghash_clmulni_intel cryptd joydev pcspkr i2c_piix4 drm_panel_orientation_quirks virtio_balloon parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw virtio_pci virtio_ring virtio
      [ 1257.869996] CPU: 0 PID: 8496 Comm: lctl Kdump: loaded Tainted: G        W  OE  ------------   3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1
      [ 1257.877025] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 1257.880482] Call Trace:
      [ 1257.882843]  [<ffffffff96365147>] dump_stack+0x19/0x1b
      [ 1257.885960]  [<ffffffff95c98848>] __warn+0xd8/0x100
      [ 1257.888990]  [<ffffffff95c988cf>] warn_slowpath_fmt+0x5f/0x80
      [ 1257.892268]  [<ffffffff95f96da1>] __list_del_entry+0xa1/0xd0
      [ 1257.895832]  [<ffffffffc091b19e>] nrs_tbf_ctl+0x30e/0x580 [ptlrpc]
      [ 1257.901397]  [<ffffffffc0909af9>] nrs_policy_ctl+0x119/0x2c0 [ptlrpc]
      [ 1257.906565]  [<ffffffffc090c22d>] ptlrpc_nrs_policy_control+0xbd/0x2d0 [ptlrpc]
      [ 1257.911435]  [<ffffffffc091a805>] ptlrpc_lprocfs_nrs_tbf_rule_seq_write+0x9a5/0x1030 [ptlrpc]
      [ 1257.916663]  [<ffffffff95e42700>] vfs_write+0xc0/0x1f0
      [ 1257.920614]  [<ffffffff95e4351f>] SyS_write+0x7f/0xf0
      [ 1257.924491]  [<ffffffff96377ddb>] system_call_fastpath+0x22/0x27
      [ 1257.928491] ---[ end trace 7bf73ec94c8b9c84 ]---
      [ 1259.206242] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
      [ 1259.211189] IP: [<ffffffffc054c328>] cfs_match_nid+0x48/0xd0 [lnet]
      [ 1259.215625] PGD 0 
      [ 1259.218265] Oops: 0000 [#1] SMP 
      [ 1259.221311] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) mbcache jbd2 obdclass(OE) lnet(OE) libcfs(OE) crc_t10dif crct10dif_generic sunrpc cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops iosf_mbi crc32_pclmul drm ppdev ghash_clmulni_intel cryptd joydev pcspkr i2c_piix4 drm_panel_orientation_quirks virtio_balloon parport_pc parport ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw virtio_pci virtio_ring virtio
      [ 1259.248355] CPU: 0 PID: 1413 Comm: mdt00_001 Kdump: loaded Tainted: G        W  OE  ------------   3.10.0-957.27.2.el7_lustre.ddn1.x86_64 #1
      [ 1259.256061] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 1259.259755] task: ffff9a357a2cc100 ti: ffff9a3579e10000 task.ti: ffff9a3579e10000
      [ 1259.264015] RIP: 0010:[<ffffffffc054c328>]  [<ffffffffc054c328>] cfs_match_nid+0x48/0xd0 [lnet]
      [ 1259.268745] RSP: 0018:ffff9a3579e13c40  EFLAGS: 00010217
      [ 1259.272250] RAX: ffffffffc029d970 RBX: ffff9a3535dfa528 RCX: 0000000000000000
      [ 1259.276331] RDX: 0000000000000001 RSI: ffff9a3535dfa528 RDI: 000200000a0001dd
      [ 1259.280306] RBP: ffff9a3579e13c78 R08: 0000000000000002 R09: 0000000000000000
      [ 1259.284372] R10: ffffffffc0659fe0 R11: ffff9a357b10f060 R12: 0000000000000002
      [ 1259.288421] R13: ffff9a35768b8200 R14: ffff9a357b3cb020 R15: 0000000000000000
      [ 1259.292487] FS:  0000000000000000(0000) GS:ffff9a357fc00000(0000) knlGS:0000000000000000
      [ 1259.296743] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1259.300290] CR2: 0000000000000028 CR3: 00000000366c4000 CR4: 00000000001606f0
      [ 1259.304279] Call Trace:
      [ 1259.306816]  [<ffffffffc0912169>] nrs_tbf_nid_rule_match+0x19/0x20 [ptlrpc]
      [ 1259.310751]  [<ffffffffc0911d7c>] nrs_tbf_rule_match+0x6c/0xd0 [ptlrpc]
      [ 1259.314523]  [<ffffffffc09155a7>] nrs_tbf_res_get+0xb7/0x510 [ptlrpc]
      [ 1259.318206]  [<ffffffffc0907992>] nrs_resource_get+0x82/0x100 [ptlrpc]
      [ 1259.322038]  [<ffffffffc0907f10>] nrs_resource_get_safe+0x80/0xf0 [ptlrpc]
      [ 1259.325951]  [<ffffffffc090ba73>] ptlrpc_nrs_req_initialize+0x83/0x100 [ptlrpc]
      [ 1259.329934]  [<ffffffffc08d69e1>] ptlrpc_server_handle_req_in+0x751/0xd60 [ptlrpc]
      [ 1259.333991]  [<ffffffffc08dab1d>] ptlrpc_main+0xaad/0x1460 [ptlrpc]
      [ 1259.337474]  [<ffffffff95cd1ad0>] ? finish_task_switch+0x50/0x1c0
      [ 1259.341009]  [<ffffffffc08da070>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
      [ 1259.344963]  [<ffffffff95cc2e81>] kthread+0xd1/0xe0
      [ 1259.348089]  [<ffffffff95cc2db0>] ? insert_kthread_work+0x40/0x40
      [ 1259.351584]  [<ffffffff96377c37>] ret_from_fork_nospec_begin+0x21/0x21
      [ 1259.355185]  [<ffffffff95cc2db0>] ? insert_kthread_work+0x40/0x40
      [ 1259.358637] Code: c1 ec 10 0f b7 c0 48 89 f3 48 83 ec 10 4c 8b 3e 48 89 7d c8 89 45 d0 49 39 f7 75 0f eb 7d 0f 1f 44 00 00 4d 8b 3f 49 39 df 74 70 <49> 8b 47 28 44 39 20 75 ef 8b 55 d0 41 39 57 30 75 e6 41 8b 57 
      [ 1259.369540] RIP  [<ffffffffc054c328>] cfs_match_nid+0x48/0xd0 [lnet]
      [ 1259.373156]  RSP <ffff9a3579e13c40>
      [ 1259.375776] CR2: 0000000000000028
      
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              qian_wc Qian Yingjin
              Reporter:
              lixi_wc Li Xi
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: