[LU-12194] clients getting soft lockups on 2.10.7 Created: 18/Apr/19  Updated: 10/Dec/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.7
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Campbell Mcleay (Inactive) Assignee: Yang Sheng
Resolution: Unresolved Votes: 0
Labels: None
Environment:

EL 7.4.1708


Attachments: File bravo2-soft-lockups.gz     HTML File spt-table-data-bravo4    
Issue Links:
Duplicate
Related
is related to LU-11895 CPU lockup in LNetMDUnlink Open
is related to LU-12667 Read doesn't perform well in complex ... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Getting occasional soft lockups on 2.10.7 clients

kernel: NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [ptlrpcd_01_08:11711]



 Comments   
Comment by Alex Zhuravlev [ 18/Apr/19 ]

it would be very helpful if you can provide backtraces.

Comment by Campbell Mcleay (Inactive) [ 18/Apr/19 ]

Do we still need to set lru size on the MDS? We have:

cmcl@mds1 ~ -bash$ sudo lctl get_param 'ldlm.namespaces.*.lru_size'
ldlm.namespaces.MGC10.21.22.50@tcp.lru_size=3200
ldlm.namespaces.MGS.lru_size=3200
ldlm.namespaces.bravo-MDT0000-lwp-MDT0000.lru_size=0

ldlm.namespaces.mdt-bravo-MDT0000_UUID.lru_size=3200

 

 

Comment by Campbell Mcleay (Inactive) [ 18/Apr/19 ]

A couple of backtraces:

Apr 16 02:12:47 bravo2 kernel: NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [ptlrpcd_01_02:11705]
Apr 16 02:12:47 bravo2 kernel: Modules linked in: osc(OE) mgc(OE) lustre(OE) lmv(OE) fld(OE) mdc(OE) fid(OE) lov(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) vfat fat mpt3sas 8d
Apr 16 02:12:47 bravo2 kernel: stp dm_multipath llc serio_raw raid_class myri10ge scsi_transport_sas bnx2 drm_panel_orientation_quirks dca sunrpc dm_mirror dm_region_hash dm_log dm_mod [last l]
Apr 16 02:12:47 bravo2 kernel: CPU: 5 PID: 11705 Comm: ptlrpcd_01_02 Kdump: loaded Tainted: G IOEL ------------ 3.10.0-957.1.3.el7.x86_64 #1
Apr 16 02:12:47 bravo2 kernel: Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 6.4.0 07/23/2013
Apr 16 02:12:47 bravo2 kernel: task: ffff9ff40efb6180 ti: ffff9feff6e88000 task.ti: ffff9feff6e88000
Apr 16 02:12:47 bravo2 kernel: RIP: 0010:[<ffffffff9cd121e6>] [<ffffffff9cd121e6>] native_queued_spin_lock_slowpath+0x126/0x200
Apr 16 02:12:47 bravo2 kernel: RSP: 0018:ffff9feff6e8bb78 EFLAGS: 00000246
Apr 16 02:12:47 bravo2 kernel: RAX: 0000000000000000 RBX: ffffffffc0d0ca40 RCX: 0000000000290000
Apr 16 02:12:47 bravo2 kernel: RDX: ffff9ff497c9b780 RSI: 0000000000a90001 RDI: ffff9ffa8da6c640
Apr 16 02:12:47 bravo2 kernel: RBP: ffff9feff6e8bb78 R08: ffff9ff497a9b780 R09: 0000000000000000
Apr 16 02:12:47 bravo2 kernel: R10: 0000000000000000 R11: 000000000000000f R12: ffff9ff2dad40a00
Apr 16 02:12:47 bravo2 kernel: R13: 0005ca25f87f7110 R14: ffff9ff2dc03a100 R15: 000000000000000a
Apr 16 02:12:47 bravo2 kernel: FS: 0000000000000000(0000) GS:ffff9ff497a80000(0000) knlGS:0000000000000000
Apr 16 02:12:47 bravo2 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 16 02:12:47 bravo2 kernel: CR2: 00007f76653c03cc CR3: 00000000a7c10000 CR4: 00000000000207e0
Apr 16 02:12:47 bravo2 kernel: Call Trace:
Apr 16 02:12:47 bravo2 kernel: [<ffffffff9d35bfcb>] queued_spin_lock_slowpath+0xb/0xf
Apr 16 02:12:47 bravo2 kernel: [<ffffffff9d36a480>] _raw_spin_lock+0x20/0x30
Apr 16 02:12:47 bravo2 kernel: [<ffffffffc098bc18>] cfs_percpt_lock+0x58/0x110 [libcfs]
Apr 16 02:12:47 bravo2 kernel: [<ffffffffc0a05f08>] LNetMDUnlink+0x78/0x180 [lnet]
Apr 16 02:12:47 bravo2 kernel: [<ffffffffc0c9df2f>] ptlrpc_unregister_reply+0xbf/0x790 [ptlrpc]
Apr 16 02:12:47 bravo2 kernel: [<ffffffffc0ca2c1a>] ptlrpc_expire_one_request+0xba/0x480 [ptlrpc]
Apr 16 02:12:47 bravo2 kernel: [<ffffffffc0ca308f>] ptlrpc_expired_set+0xaf/0x1a0 [ptlrpc]
Apr 16 02:12:47 bravo2 kernel: [<ffffffffc0cd333c>] ptlrpcd+0x29c/0x550 [ptlrpc]
5 c9 74 04 41 0f 18 09 8b 17 0f b7 c2

Apr 16 02:13:03 bravo2 kernel: NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [ptlrpcd_01_10:11713]
Apr 16 02:13:03 bravo2 kernel: Modules linked in: osc(OE) mgc(OE) lustre(OE) lmv(OE) fld(OE) mdc(OE) fid(OE) lov(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) vfat fat mpt3sas mp
tctl mptbase nfsv3 nfs fscache dell_rbu bonding intel_powerclamp coretemp kvm acpi_power_meter joydev ipmi_si ipmi_devintf iTCO_wdt irqbypass ipmi_msghandler sg iTCO_vendor_support gpio_ich dcd
bas wmi i7core_edac lpc_ich nfsd auth_rpcgss nfs_acl lockd grace binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi mgag200 i2c_algo_bit drm_kms_helpe
r syscopyarea sysfillrect sysimgblt fb_sys_fops ttm scsi_transport_iscsi crct10dif_pclmul ata_piix crct10dif_common crc32_pclmul crc32c_intel drm ghash_clmulni_intel libata mpt2sas aesni_intel
8021q lrw gf128mul garp glue_helper ablk_helper mrp cryptd
Apr 16 02:13:03 bravo2 kernel: stp dm_multipath llc serio_raw raid_class myri10ge scsi_transport_sas bnx2 drm_panel_orientation_quirks dca sunrpc dm_mirror dm_region_hash dm_log dm_mod [last un
loaded: usb_storage]
Apr 16 02:13:03 bravo2 kernel: CPU: 3 PID: 11713 Comm: ptlrpcd_01_10 Kdump: loaded Tainted: G IOEL ------------ 3.10.0-957.1.3.el7.x86_64 #1
Apr 16 02:13:03 bravo2 kernel: Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 6.4.0 07/23/2013
Apr 16 02:13:03 bravo2 kernel: task: ffff9ff40ffb0000 ti: ffff9ff480798000 task.ti: ffff9ff480798000
Apr 16 02:13:03 bravo2 kernel: RIP: 0010:[<ffffffff9cd121e2>] [<ffffffff9cd121e2>] native_queued_spin_lock_slowpath+0x122/0x200
Apr 16 02:13:03 bravo2 kernel: RSP: 0018:ffff9ff48079bb78 EFLAGS: 00000246
Apr 16 02:13:03 bravo2 kernel: RAX: 0000000000000000 RBX: ffffffffc0d0ca40 RCX: 0000000000190000
Apr 16 02:13:03 bravo2 kernel: RDX: ffff9ffaaf65b780 RSI: 0000000000110001 RDI: ffff9ff43a7214c0
Apr 16 02:13:03 bravo2 kernel: RBP: ffff9ff48079bb78 R08: ffff9ff497a5b780 R09: 0000000000000000
Apr 16 02:13:03 bravo2 kernel: R10: 0000000000000000 R11: 000000000000000f R12: ffff9feeb730c400
Apr 16 02:13:03 bravo2 kernel: R13: 0005ca25f78289c0 R14: ffff9feeeb73b300 R15: 000000000000000a
Apr 16 02:13:03 bravo2 kernel: FS: 0000000000000000(0000) GS:ffff9ff497a40000(0000) knlGS:0000000000000000
Apr 16 02:13:03 bravo2 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 16 02:13:03 bravo2 kernel: CR2: 00007fe1cc1a4490 CR3: 00000000a7c10000 CR4: 00000000000207e0
Apr 16 02:13:03 bravo2 kernel: Call Trace:
Apr 16 02:13:03 bravo2 kernel: [<ffffffff9d35bfcb>] queued_spin_lock_slowpath+0xb/0xf
Apr 16 02:13:03 bravo2 kernel: [<ffffffff9d36a480>] _raw_spin_lock+0x20/0x30
Apr 16 02:13:03 bravo2 kernel: [<ffffffffc098bc18>] cfs_percpt_lock+0x58/0x110 [libcfs]
Apr 16 02:13:03 bravo2 kernel: [<ffffffffc0a05f08>] LNetMDUnlink+0x78/0x180 [lnet]
Apr 16 02:13:03 bravo2 kernel: [<ffffffffc0c9df2f>] ptlrpc_unregister_reply+0xbf/0x790 [ptlrpc]
Apr 16 02:13:03 bravo2 kernel: [<ffffffffc0ca2c1a>] ptlrpc_expire_one_request+0xba/0x480 [ptlrpc]
Apr 16 02:13:03 bravo2 kernel: [<ffffffffc0ca308f>] ptlrpc_expired_set+0xaf/0x1a0 [ptlrpc]
Apr 16 02:13:03 bravo2 kernel: [<ffffffffc0cd333c>] ptlrpcd+0x29c/0x550 [ptlrpc]
Apr 16 02:13:03 bravo2 kernel: [<ffffffff9ccd67b0>] ? wake_up_state+0x20/0x20
Apr 16 02:13:03 bravo2 kernel: [<ffffffffc0cd30a0>] ? ptlrpcd_check+0x5e0/0x5e0 [ptlrpc]
Apr 16 02:13:03 bravo2 kernel: [<ffffffff9ccc1c31>] kthread+0xd1/0xe0
Apr 16 02:13:03 bravo2 kernel: [<ffffffff9ccc1b60>] ? insert_kthread_work+0x40/0x40
Apr 16 02:13:03 bravo2 kernel: [<ffffffff9d374c37>] ret_from_fork_nospec_begin+0x21/0x21
Apr 16 02:13:03 bravo2 kernel: [<ffffffff9ccc1b60>] ? insert_kthread_work+0x40/0x40
d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b

 

Comment by Peter Jones [ 18/Apr/19 ]

Yang Sheng

Can you please advise?

Thanks

Peter

Comment by Campbell Mcleay (Inactive) [ 25/Apr/19 ]

Please let me know if you need any additional information.

Thanks,

Campbell

Comment by Yang Sheng [ 25/Apr/19 ]

Hi, Campbell,

Do you have collect sysrq-t while soft lockup?

Thanks,
YangSheng

Comment by Campbell Mcleay (Inactive) [ 26/Apr/19 ]

Hi YangSheng,

They only occur occasionally (maybe once a day) so it is difficult to do it at the time of the lockup.

Kind regards,

Campbell

Comment by Campbell Mcleay (Inactive) [ 30/Apr/19 ]

Hi YangSheng,

Any other suggestions as to how we can find out what is going on here?

Regards,

Campbell

Comment by Yang Sheng [ 30/Apr/19 ]

Hi, Campbell,

For softlock up issue, collect sysrq-t is a better way. So we can find out who causes the problem. I think you can deploy a script to monitor the dmesg output and then trigger the sysrq-t while sockftlock up occurred.

Thanks,
YangSheng

Comment by Campbell Mcleay (Inactive) [ 02/May/19 ]

Hi YangSheng,

Attached are some sysrq-t dumps from when the soft lockups occurred.

Kind regards,

Campbellbravo2-soft-lockups.gz

Comment by Yang Sheng [ 06/May/19 ]

Hi, Campbell,

From stack trace:

May  2 02:24:47 bravo2 kernel: NMI watchdog: BUG: soft lockup - CPU#12 stuck for 22s! [ptlrpcd_00_04:11695]
May  2 02:24:47 bravo2 kernel: Modules linked in: osc(OE) mgc(OE) lustre(OE) lmv(OE) fld(OE) mdc(OE) fid(OE) lov(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) vfat fat mpt3sas mptctl mptbase nfsv3 nfs fscache dell_rbu bonding intel_powerclamp coretemp kvm acpi_power_meter joydev ipmi_si ipmi_devintf iTCO_wdt irqbypass ipmi_msghandler sg iTCO_vendor_support gpio_ich dcdbas wmi i7core_edac lpc_ich nfsd auth_rpcgss nfs_acl lockd grace binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm scsi_transport_iscsi crct10dif_pclmul ata_piix crct10dif_common crc32_pclmul crc32c_intel drm ghash_clmulni_intel libata mpt2sas aesni_intel 8021q lrw gf128mul garp glue_helper ablk_helper mrp cryptd
May  2 02:24:47 bravo2 kernel: stp dm_multipath llc serio_raw raid_class myri10ge scsi_transport_sas bnx2 drm_panel_orientation_quirks dca sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: usb_storage]
May  2 02:24:47 bravo2 kernel: CPU: 12 PID: 11695 Comm: ptlrpcd_00_04 Kdump: loaded Tainted: G        W IOEL ------------   3.10.0-957.1.3.el7.x86_64 #1
May  2 02:24:47 bravo2 kernel: Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 6.4.0 07/23/2013
May  2 02:24:47 bravo2 kernel: task: ffff9ff4036a30c0 ti: ffff9ff42d388000 task.ti: ffff9ff42d388000
May  2 02:24:47 bravo2 kernel: RIP: 0010:[<ffffffffc09f7a08>]  [<ffffffffc09f7a08>] lnet_res_lh_lookup+0x48/0x70 [lnet]
May  2 02:24:47 bravo2 kernel: RSP: 0018:ffff9ff42d38bbc0  EFLAGS: 00000206
May  2 02:24:47 bravo2 kernel: RAX: 0000000000000000 RBX: ffffffffffffff10 RCX: ffffb22686ad0f90
May  2 02:24:47 bravo2 kernel: RDX: ffff9fef08190610 RSI: 00000008d13a57cd RDI: ffff9feeb344f000
May  2 02:24:47 bravo2 kernel: RBP: ffff9ff42d38bbc0 R08: ffff9ffaaf79b780 R09: ffff9ff497c1b780
May  2 02:24:47 bravo2 kernel: R10: 0000000000000000 R11: 000000000000000f R12: 0000000000010001
May  2 02:24:47 bravo2 kernel: R13: ffff9ffaaf61b780 R14: 0000000000610000 R15: 0000000000000000
May  2 02:24:47 bravo2 kernel: FS:  0000000000000000(0000) GS:ffff9ffaaf780000(0000) knlGS:0000000000000000
May  2 02:24:47 bravo2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May  2 02:24:47 bravo2 kernel: CR2: 00007f3c4fc10000 CR3: 00000000a7c10000 CR4: 00000000000207e0
May  2 02:24:47 bravo2 kernel: Call Trace:
May  2 02:24:47 bravo2 kernel: [<ffffffffc0a05f3c>] LNetMDUnlink+0xac/0x180 [lnet]
May  2 02:24:47 bravo2 kernel: [<ffffffffc0c9df2f>] ptlrpc_unregister_reply+0xbf/0x790 [ptlrpc]
May  2 02:24:47 bravo2 kernel: [<ffffffffc0ca2c1a>] ptlrpc_expire_one_request+0xba/0x480 [ptlrpc]
May  2 02:24:47 bravo2 kernel: [<ffffffffc0ca308f>] ptlrpc_expired_set+0xaf/0x1a0 [ptlrpc]
May  2 02:24:47 bravo2 kernel: [<ffffffffc0cd333c>] ptlrpcd+0x29c/0x550 [ptlrpc]
May  2 02:24:47 bravo2 kernel: [<ffffffff9ccd67b0>] ? wake_up_state+0x20/0x20
May  2 02:24:47 bravo2 kernel: [<ffffffffc0cd30a0>] ? ptlrpcd_check+0x5e0/0x5e0 [ptlrpc]
May  2 02:24:47 bravo2 kernel: [<ffffffff9ccc1c31>] kthread+0xd1/0xe0
May  2 02:24:47 bravo2 kernel: [<ffffffff9ccc1b60>] ? insert_kthread_work+0x40/0x40
May  2 02:24:47 bravo2 kernel: [<ffffffff9d374c37>] ret_from_fork_nospec_begin+0x21/0x21
May  2 02:24:47 bravo2 kernel: [<ffffffff9ccc1b60>] ? insert_kthread_work+0x40/0x40
May  2 02:24:47 bravo2 kernel: Code: 00 48 89 f2 83 c1 02 48 d3 ea 48 89 d1 81 e1 ff 0f 00 00 48 c1 e1 04 48 03 4f 20 48 8b 11 48 39 ca 75 10 eb 17 66 0f 1f 44 00 00 <48> 8b 12 48 39 ca 74 10 48 39 72 10 75 f2 48 89 d0 5d c3 0f 1f

If this thread hold lock and loop over the list for a long time, then the soft lockup could be triggered. But still not very clear why it spend long time in there? Have any possible to apply a debug patch in your site?

Thanks,
YangSheng

Comment by Campbell Mcleay (Inactive) [ 07/May/19 ]

Hi YangSheng,

Yes, we can apply a debug patch - please let me know what you need me to do.

Kind regards,

Campbell

Comment by Yang Sheng [ 07/May/19 ]

Hi, Campbell,

That is great. So you just use standard 2.10.7 release without other extra patch on your site?

Thanks,
YangSheng

Comment by Campbell Mcleay (Inactive) [ 07/May/19 ]

Yes, it is a standard 2.10.7 release, with no patches.

Regards,

Campbell

Comment by Gerrit Updater [ 10/May/19 ]

Yang Sheng (ys@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34845
Subject: LU-12194 lnet: debug patch
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 9c51db3b1fd9d9c47c60da79c9f5fd3f27533c27

Comment by Yang Sheng [ 13/May/19 ]

Hi, Campbell,

The debug patch has passed tests. Do you have chance to install the it on your site?

Thanks,
YangSheng

Comment by Campbell Mcleay (Inactive) [ 14/May/19 ]

Hi YangSheng,

Do I need to just build and install the client packages, or do I have to build and install it for the OSSs and MDS?

Regards,

Campbell

Comment by Patrick Farrell (Inactive) [ 14/May/19 ]

Campbell,

This particular patch is only relevant where you're getting the lockups, so in this case, clients.

Comment by Yang Sheng [ 15/May/19 ]

Hi, Campbell,

As Patrick pointed out, only client need this patch.

Thanks,
YangSheng

Comment by Campbell Mcleay (Inactive) [ 15/May/19 ]

Thanks - should I use lustre-client-debuginfo or will lustre-client suffice?

Comment by Patrick Farrell (Inactive) [ 15/May/19 ]

Well, this will be self regulating - if you just install lustre-client-debuginfo it won't work

[edit]

Reading this again, I see you were just suggesting it as an additional package.  Sorry for my flippancy.

[/edit]

More seriously:
Just lustre-client.  lustre-client-debuginfo is additional debug information for the lustre-client package, used when examining a crash, not at runtime (and it doesn't contain the lustre-client stuff).

Comment by Campbell Mcleay (Inactive) [ 16/May/19 ]

No problem, thanks Patrick. Just having issues with configure just erroring out (with no useful error, not even in the config log). It did build successfully on a different kernel source tree to the running kernel, but obviously that's no use. Will let you know when I have it patched.

 checking for /lib/modules/3.10.0-957.1.3.el7.x86_64/source/include/linux/kconfig.h... yes
checking for external module build target... configure: error: unknown; check config.log for details
Comment by Campbell Mcleay (Inactive) [ 16/May/19 ]

Just to check: it should build against kernel version 3.10.0-957.1.3.el7.x86_64?

Comment by Yang Sheng [ 16/May/19 ]

Yes, It can be built on this version. Just check whether your kernel tree has been prepared proper.

Comment by Campbell Mcleay (Inactive) [ 20/May/19 ]

I think it was due to some feature missing from EL 7.4 that caused it to break when compiling kernel version 3.10.0-957, since it built fine on kernel version 3.10.0-693 for example. I ended up building it on a 7.6 host and that worked, so now it is installed so I'll let you know if we get any results.

Cheers,
Campbell

Comment by Yang Sheng [ 21/May/19 ]

Hi, Campbell,

Thanks for the info. Please monitor the output of /proc/lnet_spt/spt_table. You can collect it periodicity(I think 1 minute is enough). Especially after softlockup.

Thanks,
YangSheng

Comment by Campbell Mcleay (Inactive) [ 21/May/19 ]

Hi YangSheng,

Attached is some spt_table data at the time of the lockups (there were 8 lockup events). Let me know if you need spt_table data outside these times as well as I have collection triggered by lockups only

Kind regards,

Campbell

Comment by Yang Sheng [ 22/May/19 ]

Hi, Campbell,

Since the patch want to gather maximum hold time of cpt lock. So the latest the better.

Thanks,
Yangsheng

Comment by Campbell Mcleay (Inactive) [ 22/May/19 ]

Hi YangSheng,

As it is collecting spt_table data when there are lockups, I assume that it is showing the maximum hold time of the lock on the cpu - or have I got that wrong? Should I just gather the data every minute? Please let me know what periods you will need.

Kind regards,

Campbell

Comment by Yang Sheng [ 22/May/19 ]

Hi, Campbell,

Just latest data is enough. Except you reload the lnet module after lockup. From the log, Looks like the delay is not so high.

Thanks,
YangSheng

Comment by Campbell Mcleay (Inactive) [ 22/May/19 ]

Should I use 'lustre_rmmod' and then 'modprobe lustre' after a lockup is detected? And should I keep sending you data after lockups or do you have enough to work with for now?

Kind regards,

Campbell

Comment by Yang Sheng [ 22/May/19 ]

Hi, Campbell,

--Should I use 'lustre_rmmod' and then 'modprobe lustre' after a lockup is detected?
No, Please collect data after lockup without 'rmmod'.

--should I keep sending you data after lockups or do you have enough to work with for now?
Yes, please send the data after every lockup.

Thanks,
YangSheng

Comment by Campbell Mcleay (Inactive) [ 22/May/19 ]

Latest one (only one cpu core locked up):

LNetEQAlloc:000:0,0,0,0,0,0,0,0,0,0:3:
LNetEQAlloc:001:0,0,0,0,0,0,0,0,0,0:0:
LNetEQAlloc:002:0,0,0,0,0,0,0,0,0,0:0:
LNetMEAttach:000:0,0,0,0,0,0,0,0,0,0:0:
LNetMEAttach:001:1863,90,7,0,0,0,0,0,0,0:94076607:
LNetMEAttach:002:8331,102,44,0,0,0,0,0,0,0:455990518:
LNetMDAttach:000:0,0,0,0,0,0,0,0,0,0:0:
LNetMDAttach:001:2874,67,14,0,0,0,0,0,0,0:94076607:
LNetMDAttach:002:1490,93,62,0,0,0,0,0,0,0:455990518:
LNetSetLazyPortal:000:0,0,0,0,0,0,0,0,0,0:1:
LNetSetLazyPortal:001:0,0,0,0,0,0,0,0,0,0:0:
LNetSetLazyPortal:002:0,0,0,0,0,0,0,0,0,0:0:
lnet_res_lock_current:000:0,0,0,0,0,0,0,0,0,0:0:
lnet_res_lock_current:001:0,0,0,0,0,0,0,0,0,0:208589267:
lnet_res_lock_current:002:0,0,0,0,0,0,0,0,0,0:337491708:
LNetPut:000:0,0,0,0,0,0,0,0,0,0:0:
LNetPut:001:903,904,904,904,213,32,61,62,1,0:208589267:
LNetPut:002:2042,3861,3862,16575,483,40,56,56,8,0:337491708:
lnet_finalize:000:0,0,0,0,0,0,0,0,0,0:0:
lnet_finalize:001:17604,1020,23,0,0,0,0,0,0,0:301706190:
lnet_finalize:002:18562,110,55,0,0,0,0,0,0,0:789653414:
lnet_ptl_match_md:000:0,0,0,0,0,0,0,0,0,0:0:
lnet_ptl_match_md:001:113272,1277,1160,707,0,0,0,0,0,0:94578970:
lnet_ptl_match_md:002:24143,1081,799,52,0,0,0,0,0,0:459115756:
LNetMDUnlink:000:0,0,0,0,0,0,0,0,0,0:0:
LNetMDUnlink:001:6202,11217,15551,15551,319,67,68,69,69,46:93967533:
LNetMDUnlink:002:212311,212312,212314,212314,173,100,101,101,77,80:446618837:
lnet_ptl_match_delay:000:0,0,0,0,0,0,0,0,0,0:0:
lnet_ptl_match_delay:001:62,1,0,0,0,0,0,0,0,0:89980:
lnet_ptl_match_delay:002:36,20,0,0,0,0,0,0,0,0:47777:

Comment by Yang Sheng [ 23/May/19 ]

Hi, Campbell,

Could you please collect data as below:

# lctl get_param cpu_partition_table

Thanks,
YangSheng

Comment by Campbell Mcleay (Inactive) [ 23/May/19 ]

Hi YangSheng,

All clients have:

cpu_partition_table=
0 : 0 2 4 6 8 10 12 14 16 18 20 22
1 : 1 3 5 7 9 11 13 15 17 19 21 23

Regards,
Campbell

Comment by Yang Sheng [ 23/May/19 ]

Hi, Campbell,

Please add this line into /etc/modprobe.d/ko2iblnd.conf.

options libcfs cpu_npartitions=6

And then reload the lustre modules to verify whether the lockup still be hit. Please ensure it is effective by 'lctl get_param cpu_partition_table'.

Thanks,
YangSheng

Comment by Campbell Mcleay (Inactive) [ 23/May/19 ]

Hi YangSheng,

I added the modprobe line and reloaded lustre modules, but it is not working:

May 23 18:39:20 bravo2 kernel: LNet: HW NUMA nodes: 2, HW CPU cores: 24, npartitions: 2
May 23 18:46:16 bravo2 kernel: LNet: HW NUMA nodes: 2, HW CPU cores: 24, npartitions: 2

cpu_partition_table=
0 : 0 2 4 6 8 10 12 14 16 18 20 22
1 : 1 3 5 7 9 11 13 15 17 19 21 23

/etc/modprobe.d/ko2iblnd.conf

alias ko2iblnd-opa ko2iblnd
options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 libcfs cpu_npartitions=6

install ko2iblnd /usr/sbin/ko2iblnd-probe

Having a look at what I'm doing wrong

Comment by Yang Sheng [ 24/May/19 ]

Hi, Campbell,

Please add the "options libcfs cpu_npartitions=6" as a NEW line. Also you can use 'modprobe libcfs cpu_npartitions=6'
before mount lustre. So can avoid changing any files.

Thanks,
YangSheng

Comment by Yang Sheng [ 24/May/19 ]

Hi, Campbell,

I note that you have 2 NUMA nodes. So we need partition explicitly as below:

options libcfs cpu_pattern=[0,2,4,6,8,10]1[12,14,16,18,20,22]2[1,3,5,7,9,11]3[13,15,17,19,21,23]

Or you can use 'modprobe cpu_pattern=[0,2,4,6,8,10]1[12,14,16,18,20,22]2[1,3,5,7,9,11]3[13,15,17,19,21,23]'

Thanks,
YangSheng

Comment by Campbell Mcleay (Inactive) [ 24/May/19 ]

Hi YangSheng,

Had to modify it slightly to work as it complained:

May 24 10:38:49 bravo2 kernel: LNetError: 21221:0:(linux-cpu.c:1151:cfs_cpu_init()) Failed to create cptab from pattern '[0,2,4,6,8,10]1[12,14,16,18,20,22]2[1,3,5,7,9,11]3[13,15,17,19,21,23]'

Modified cpu_pattern to have a partition number for the first set, so I have:

alias ko2iblnd-opa ko2iblnd
options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4
options libcfs cpu_npartitions=6
options libcfs cpu_pattern=0[0,2,4,6,8,10]1[12,14,16,18,20,22]2[1,3,5,7,9,11]3[13,15,17,19,21,23]

install ko2iblnd /usr/sbin/ko2iblnd-probe

So I get:

cpu_partition_table=
0 : 0 2 4 6 8 10
1 : 12 14 16 18 20 22
2 : 1 3 5 7 9 11
3 : 13 15 17 19 21 23

Which looks like what we want I assume.

Comment by Yang Sheng [ 24/May/19 ]

Hi, Campbell,

Yes, I am sorry have typo in my comment. So please test with this pattern to see whether the lockup can be reproduced.

BTW: The 'options libcfs cpu_npartitions=6' can be removed.

Thanks,
YangSheng

Comment by Yang Sheng [ 27/May/19 ]

Hi, Campbell,

Could you please tell me the status of site? Do you still collect spt_table data?

Thanks,
YangSheng

Comment by Campbell Mcleay (Inactive) [ 04/Jun/19 ]

Hi YangSheng,

I'm not collecting spt_table_data at the moment, but I also haven't seen any soft lockups since the changes were made. So what next from here? Do I just add these options to all clients on 2.10.7? Or is there a patch imminent to prevent the issue with the default CPU topology?

Kind regards,

Campbell

Comment by Yang Sheng [ 04/Jun/19 ]

Hi, Campbell,

I think you can apply this change to all of clients that might be impacted by this issue. I'll try to push a patch to make this change more easy. But i think it could take a long time. So can we close this one first?

BTW: you can back to your original version lustre to remove the debug patch.

Thanks,
Yangsheng

Comment by Campbell Mcleay (Inactive) [ 04/Jun/19 ]

Thanks Yangsheng. So the proposed patch will be to modify ko2iblnd.conf?

Comment by Yang Sheng [ 05/Jun/19 ]

Hi, Campbell,

No, It will set cpt automatically. So we needn't set it by manually. We do it for UMA node. But looks like not on NUMA node.

Thanks,
Yangsheng

Comment by Campbell Mcleay (Inactive) [ 06/Jun/19 ]

Hi YangSheng,

What is the general rule for setting cpu_npartitions - is it number of NUMA node cpus divided by no. of NUMA nodes?

Thanks,

Campbell

Comment by Campbell Mcleay (Inactive) [ 12/Jun/19 ]

Hi YangSheng,

Are you able to confirm what the general rule is for partitioning?

Thanks,
Campbell

Comment by Yang Sheng [ 12/Jun/19 ]

Hi, Campbell,

You can refer to document http://doc.lustre.org/lustre_manual.xhtml#dbdoclet.libcfstuning. But we still haven't a detail standard for CPT configuration. Since it is really depend on situation.

Thanks,
Yangsheng

Comment by Campbell Mcleay (Inactive) [ 25/Jun/19 ]

Hi YangSheng,

I didn't see a patch in the 2.10.7 -> 2.10.8 changelog that will set NUMA topology - you mentioned it may take some time to get this patched - do you think it may get done within the next few months? I'm just wondering whether to wait for the patches and upgrade.

Thanks,

Campbell

Comment by Yang Sheng [ 26/Jun/19 ]

Hi, Campbell,

I am testing the patch in our test cluster. Yes, I think it will be landed in next few months. You can setup it via cpu_pattern before that.

Thanks,
YangSheng

Generated at Sat Feb 10 02:50:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.