[LU-12194] clients getting soft lockups on 2.10.7 Created: 18/Apr/19 Updated: 10/Dec/20 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Campbell Mcleay (Inactive) | Assignee: | Yang Sheng |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
EL 7.4.1708 |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Getting occasional soft lockups on 2.10.7 clients kernel: NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [ptlrpcd_01_08:11711] |
| Comments |
| Comment by Alex Zhuravlev [ 18/Apr/19 ] |
|
it would be very helpful if you can provide backtraces. |
| Comment by Campbell Mcleay (Inactive) [ 18/Apr/19 ] |
|
Do we still need to set lru size on the MDS? We have: cmcl@mds1 ~ -bash$ sudo lctl get_param 'ldlm.namespaces.*.lru_size' ldlm.namespaces.mdt-bravo-MDT0000_UUID.lru_size=3200
|
| Comment by Campbell Mcleay (Inactive) [ 18/Apr/19 ] |
|
A couple of backtraces: Apr 16 02:12:47 bravo2 kernel: NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [ptlrpcd_01_02:11705] Apr 16 02:13:03 bravo2 kernel: NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [ptlrpcd_01_10:11713]
|
| Comment by Peter Jones [ 18/Apr/19 ] |
|
Yang Sheng Can you please advise? Thanks Peter |
| Comment by Campbell Mcleay (Inactive) [ 25/Apr/19 ] |
|
Please let me know if you need any additional information. Thanks, Campbell |
| Comment by Yang Sheng [ 25/Apr/19 ] |
|
Hi, Campbell, Do you have collect sysrq-t while soft lockup? Thanks, |
| Comment by Campbell Mcleay (Inactive) [ 26/Apr/19 ] |
|
Hi YangSheng, They only occur occasionally (maybe once a day) so it is difficult to do it at the time of the lockup. Kind regards, Campbell |
| Comment by Campbell Mcleay (Inactive) [ 30/Apr/19 ] |
|
Hi YangSheng, Any other suggestions as to how we can find out what is going on here? Regards, Campbell |
| Comment by Yang Sheng [ 30/Apr/19 ] |
|
Hi, Campbell, For softlock up issue, collect sysrq-t is a better way. So we can find out who causes the problem. I think you can deploy a script to monitor the dmesg output and then trigger the sysrq-t while sockftlock up occurred. Thanks, |
| Comment by Campbell Mcleay (Inactive) [ 02/May/19 ] |
|
Hi YangSheng, Attached are some sysrq-t dumps from when the soft lockups occurred. Kind regards, Campbellbravo2-soft-lockups.gz |
| Comment by Yang Sheng [ 06/May/19 ] |
|
Hi, Campbell, From stack trace: May 2 02:24:47 bravo2 kernel: NMI watchdog: BUG: soft lockup - CPU#12 stuck for 22s! [ptlrpcd_00_04:11695] May 2 02:24:47 bravo2 kernel: Modules linked in: osc(OE) mgc(OE) lustre(OE) lmv(OE) fld(OE) mdc(OE) fid(OE) lov(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) vfat fat mpt3sas mptctl mptbase nfsv3 nfs fscache dell_rbu bonding intel_powerclamp coretemp kvm acpi_power_meter joydev ipmi_si ipmi_devintf iTCO_wdt irqbypass ipmi_msghandler sg iTCO_vendor_support gpio_ich dcdbas wmi i7core_edac lpc_ich nfsd auth_rpcgss nfs_acl lockd grace binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm scsi_transport_iscsi crct10dif_pclmul ata_piix crct10dif_common crc32_pclmul crc32c_intel drm ghash_clmulni_intel libata mpt2sas aesni_intel 8021q lrw gf128mul garp glue_helper ablk_helper mrp cryptd May 2 02:24:47 bravo2 kernel: stp dm_multipath llc serio_raw raid_class myri10ge scsi_transport_sas bnx2 drm_panel_orientation_quirks dca sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: usb_storage] May 2 02:24:47 bravo2 kernel: CPU: 12 PID: 11695 Comm: ptlrpcd_00_04 Kdump: loaded Tainted: G W IOEL ------------ 3.10.0-957.1.3.el7.x86_64 #1 May 2 02:24:47 bravo2 kernel: Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 6.4.0 07/23/2013 May 2 02:24:47 bravo2 kernel: task: ffff9ff4036a30c0 ti: ffff9ff42d388000 task.ti: ffff9ff42d388000 May 2 02:24:47 bravo2 kernel: RIP: 0010:[<ffffffffc09f7a08>] [<ffffffffc09f7a08>] lnet_res_lh_lookup+0x48/0x70 [lnet] May 2 02:24:47 bravo2 kernel: RSP: 0018:ffff9ff42d38bbc0 EFLAGS: 00000206 May 2 02:24:47 bravo2 kernel: RAX: 0000000000000000 RBX: ffffffffffffff10 RCX: ffffb22686ad0f90 May 2 02:24:47 bravo2 kernel: RDX: ffff9fef08190610 RSI: 00000008d13a57cd RDI: ffff9feeb344f000 May 2 02:24:47 bravo2 kernel: RBP: ffff9ff42d38bbc0 R08: ffff9ffaaf79b780 R09: ffff9ff497c1b780 May 2 02:24:47 bravo2 kernel: R10: 0000000000000000 R11: 000000000000000f R12: 0000000000010001 May 2 02:24:47 bravo2 kernel: R13: ffff9ffaaf61b780 R14: 0000000000610000 R15: 0000000000000000 May 2 02:24:47 bravo2 kernel: FS: 0000000000000000(0000) GS:ffff9ffaaf780000(0000) knlGS:0000000000000000 May 2 02:24:47 bravo2 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 2 02:24:47 bravo2 kernel: CR2: 00007f3c4fc10000 CR3: 00000000a7c10000 CR4: 00000000000207e0 May 2 02:24:47 bravo2 kernel: Call Trace: May 2 02:24:47 bravo2 kernel: [<ffffffffc0a05f3c>] LNetMDUnlink+0xac/0x180 [lnet] May 2 02:24:47 bravo2 kernel: [<ffffffffc0c9df2f>] ptlrpc_unregister_reply+0xbf/0x790 [ptlrpc] May 2 02:24:47 bravo2 kernel: [<ffffffffc0ca2c1a>] ptlrpc_expire_one_request+0xba/0x480 [ptlrpc] May 2 02:24:47 bravo2 kernel: [<ffffffffc0ca308f>] ptlrpc_expired_set+0xaf/0x1a0 [ptlrpc] May 2 02:24:47 bravo2 kernel: [<ffffffffc0cd333c>] ptlrpcd+0x29c/0x550 [ptlrpc] May 2 02:24:47 bravo2 kernel: [<ffffffff9ccd67b0>] ? wake_up_state+0x20/0x20 May 2 02:24:47 bravo2 kernel: [<ffffffffc0cd30a0>] ? ptlrpcd_check+0x5e0/0x5e0 [ptlrpc] May 2 02:24:47 bravo2 kernel: [<ffffffff9ccc1c31>] kthread+0xd1/0xe0 May 2 02:24:47 bravo2 kernel: [<ffffffff9ccc1b60>] ? insert_kthread_work+0x40/0x40 May 2 02:24:47 bravo2 kernel: [<ffffffff9d374c37>] ret_from_fork_nospec_begin+0x21/0x21 May 2 02:24:47 bravo2 kernel: [<ffffffff9ccc1b60>] ? insert_kthread_work+0x40/0x40 May 2 02:24:47 bravo2 kernel: Code: 00 48 89 f2 83 c1 02 48 d3 ea 48 89 d1 81 e1 ff 0f 00 00 48 c1 e1 04 48 03 4f 20 48 8b 11 48 39 ca 75 10 eb 17 66 0f 1f 44 00 00 <48> 8b 12 48 39 ca 74 10 48 39 72 10 75 f2 48 89 d0 5d c3 0f 1f If this thread hold lock and loop over the list for a long time, then the soft lockup could be triggered. But still not very clear why it spend long time in there? Have any possible to apply a debug patch in your site? Thanks, |
| Comment by Campbell Mcleay (Inactive) [ 07/May/19 ] |
|
Hi YangSheng, Yes, we can apply a debug patch - please let me know what you need me to do. Kind regards, Campbell |
| Comment by Yang Sheng [ 07/May/19 ] |
|
Hi, Campbell, That is great. So you just use standard 2.10.7 release without other extra patch on your site? Thanks, |
| Comment by Campbell Mcleay (Inactive) [ 07/May/19 ] |
|
Yes, it is a standard 2.10.7 release, with no patches. Regards, Campbell |
| Comment by Gerrit Updater [ 10/May/19 ] |
|
Yang Sheng (ys@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34845 |
| Comment by Yang Sheng [ 13/May/19 ] |
|
Hi, Campbell, The debug patch has passed tests. Do you have chance to install the it on your site? Thanks, |
| Comment by Campbell Mcleay (Inactive) [ 14/May/19 ] |
|
Hi YangSheng, Do I need to just build and install the client packages, or do I have to build and install it for the OSSs and MDS? Regards, Campbell |
| Comment by Patrick Farrell (Inactive) [ 14/May/19 ] |
|
Campbell, This particular patch is only relevant where you're getting the lockups, so in this case, clients. |
| Comment by Yang Sheng [ 15/May/19 ] |
|
Hi, Campbell, As Patrick pointed out, only client need this patch. Thanks, |
| Comment by Campbell Mcleay (Inactive) [ 15/May/19 ] |
|
Thanks - should I use lustre-client-debuginfo or will lustre-client suffice? |
| Comment by Patrick Farrell (Inactive) [ 15/May/19 ] |
|
Well, this will be self regulating - if you just install lustre-client-debuginfo it won't work [edit] Reading this again, I see you were just suggesting it as an additional package. Sorry for my flippancy. [/edit] More seriously: |
| Comment by Campbell Mcleay (Inactive) [ 16/May/19 ] |
|
No problem, thanks Patrick. Just having issues with configure just erroring out (with no useful error, not even in the config log). It did build successfully on a different kernel source tree to the running kernel, but obviously that's no use. Will let you know when I have it patched. checking for /lib/modules/3.10.0-957.1.3.el7.x86_64/source/include/linux/kconfig.h... yes checking for external module build target... configure: error: unknown; check config.log for details |
| Comment by Campbell Mcleay (Inactive) [ 16/May/19 ] |
|
Just to check: it should build against kernel version 3.10.0-957.1.3.el7.x86_64? |
| Comment by Yang Sheng [ 16/May/19 ] |
|
Yes, It can be built on this version. Just check whether your kernel tree has been prepared proper. |
| Comment by Campbell Mcleay (Inactive) [ 20/May/19 ] |
|
I think it was due to some feature missing from EL 7.4 that caused it to break when compiling kernel version 3.10.0-957, since it built fine on kernel version 3.10.0-693 for example. I ended up building it on a 7.6 host and that worked, so now it is installed so I'll let you know if we get any results. Cheers, |
| Comment by Yang Sheng [ 21/May/19 ] |
|
Hi, Campbell, Thanks for the info. Please monitor the output of /proc/lnet_spt/spt_table. You can collect it periodicity(I think 1 minute is enough). Especially after softlockup. Thanks, |
| Comment by Campbell Mcleay (Inactive) [ 21/May/19 ] |
|
Hi YangSheng, Attached is some spt_table data at the time of the lockups (there were 8 lockup events). Let me know if you need spt_table data outside these times as well as I have collection triggered by lockups only Kind regards, Campbell |
| Comment by Yang Sheng [ 22/May/19 ] |
|
Hi, Campbell, Since the patch want to gather maximum hold time of cpt lock. So the latest the better. Thanks, |
| Comment by Campbell Mcleay (Inactive) [ 22/May/19 ] |
|
Hi YangSheng, As it is collecting spt_table data when there are lockups, I assume that it is showing the maximum hold time of the lock on the cpu - or have I got that wrong? Should I just gather the data every minute? Please let me know what periods you will need. Kind regards, Campbell |
| Comment by Yang Sheng [ 22/May/19 ] |
|
Hi, Campbell, Just latest data is enough. Except you reload the lnet module after lockup. From the log, Looks like the delay is not so high. Thanks, |
| Comment by Campbell Mcleay (Inactive) [ 22/May/19 ] |
|
Should I use 'lustre_rmmod' and then 'modprobe lustre' after a lockup is detected? And should I keep sending you data after lockups or do you have enough to work with for now? Kind regards, Campbell |
| Comment by Yang Sheng [ 22/May/19 ] |
|
Hi, Campbell, --Should I use 'lustre_rmmod' and then 'modprobe lustre' after a lockup is detected? --should I keep sending you data after lockups or do you have enough to work with for now? Thanks, |
| Comment by Campbell Mcleay (Inactive) [ 22/May/19 ] |
|
Latest one (only one cpu core locked up): LNetEQAlloc:000:0,0,0,0,0,0,0,0,0,0:3: |
| Comment by Yang Sheng [ 23/May/19 ] |
|
Hi, Campbell, Could you please collect data as below: # lctl get_param cpu_partition_table Thanks, |
| Comment by Campbell Mcleay (Inactive) [ 23/May/19 ] |
|
Hi YangSheng, All clients have: cpu_partition_table= Regards, |
| Comment by Yang Sheng [ 23/May/19 ] |
|
Hi, Campbell, Please add this line into /etc/modprobe.d/ko2iblnd.conf. options libcfs cpu_npartitions=6 And then reload the lustre modules to verify whether the lockup still be hit. Please ensure it is effective by 'lctl get_param cpu_partition_table'. Thanks, |
| Comment by Campbell Mcleay (Inactive) [ 23/May/19 ] |
|
Hi YangSheng, I added the modprobe line and reloaded lustre modules, but it is not working: May 23 18:39:20 bravo2 kernel: LNet: HW NUMA nodes: 2, HW CPU cores: 24, npartitions: 2 cpu_partition_table= /etc/modprobe.d/ko2iblnd.conf alias ko2iblnd-opa ko2iblnd options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 libcfs cpu_npartitions=6 install ko2iblnd /usr/sbin/ko2iblnd-probe Having a look at what I'm doing wrong |
| Comment by Yang Sheng [ 24/May/19 ] |
|
Hi, Campbell, Please add the "options libcfs cpu_npartitions=6" as a NEW line. Also you can use 'modprobe libcfs cpu_npartitions=6' Thanks, |
| Comment by Yang Sheng [ 24/May/19 ] |
|
Hi, Campbell, I note that you have 2 NUMA nodes. So we need partition explicitly as below: options libcfs cpu_pattern=[0,2,4,6,8,10]1[12,14,16,18,20,22]2[1,3,5,7,9,11]3[13,15,17,19,21,23] Or you can use 'modprobe cpu_pattern=[0,2,4,6,8,10]1[12,14,16,18,20,22]2[1,3,5,7,9,11]3[13,15,17,19,21,23]' Thanks, |
| Comment by Campbell Mcleay (Inactive) [ 24/May/19 ] |
|
Hi YangSheng, Had to modify it slightly to work as it complained: May 24 10:38:49 bravo2 kernel: LNetError: 21221:0:(linux-cpu.c:1151:cfs_cpu_init()) Failed to create cptab from pattern '[0,2,4,6,8,10]1[12,14,16,18,20,22]2[1,3,5,7,9,11]3[13,15,17,19,21,23]' Modified cpu_pattern to have a partition number for the first set, so I have: alias ko2iblnd-opa ko2iblnd options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 options libcfs cpu_npartitions=6 options libcfs cpu_pattern=0[0,2,4,6,8,10]1[12,14,16,18,20,22]2[1,3,5,7,9,11]3[13,15,17,19,21,23] install ko2iblnd /usr/sbin/ko2iblnd-probe So I get: cpu_partition_table= Which looks like what we want I assume. |
| Comment by Yang Sheng [ 24/May/19 ] |
|
Hi, Campbell, Yes, I am sorry have typo in my comment. So please test with this pattern to see whether the lockup can be reproduced. BTW: The 'options libcfs cpu_npartitions=6' can be removed. Thanks, |
| Comment by Yang Sheng [ 27/May/19 ] |
|
Hi, Campbell, Could you please tell me the status of site? Do you still collect spt_table data? Thanks, |
| Comment by Campbell Mcleay (Inactive) [ 04/Jun/19 ] |
|
Hi YangSheng, I'm not collecting spt_table_data at the moment, but I also haven't seen any soft lockups since the changes were made. So what next from here? Do I just add these options to all clients on 2.10.7? Or is there a patch imminent to prevent the issue with the default CPU topology? Kind regards, Campbell |
| Comment by Yang Sheng [ 04/Jun/19 ] |
|
Hi, Campbell, I think you can apply this change to all of clients that might be impacted by this issue. I'll try to push a patch to make this change more easy. But i think it could take a long time. So can we close this one first? BTW: you can back to your original version lustre to remove the debug patch. Thanks, |
| Comment by Campbell Mcleay (Inactive) [ 04/Jun/19 ] |
|
Thanks Yangsheng. So the proposed patch will be to modify ko2iblnd.conf? |
| Comment by Yang Sheng [ 05/Jun/19 ] |
|
Hi, Campbell, No, It will set cpt automatically. So we needn't set it by manually. We do it for UMA node. But looks like not on NUMA node. Thanks, |
| Comment by Campbell Mcleay (Inactive) [ 06/Jun/19 ] |
|
Hi YangSheng, What is the general rule for setting cpu_npartitions - is it number of NUMA node cpus divided by no. of NUMA nodes? Thanks, Campbell |
| Comment by Campbell Mcleay (Inactive) [ 12/Jun/19 ] |
|
Hi YangSheng, Are you able to confirm what the general rule is for partitioning? Thanks, |
| Comment by Yang Sheng [ 12/Jun/19 ] |
|
Hi, Campbell, You can refer to document http://doc.lustre.org/lustre_manual.xhtml#dbdoclet.libcfstuning. But we still haven't a detail standard for CPT configuration. Since it is really depend on situation. Thanks, |
| Comment by Campbell Mcleay (Inactive) [ 25/Jun/19 ] |
|
Hi YangSheng, I didn't see a patch in the 2.10.7 -> 2.10.8 changelog that will set NUMA topology - you mentioned it may take some time to get this patched - do you think it may get done within the next few months? I'm just wondering whether to wait for the patches and upgrade. Thanks, Campbell |
| Comment by Yang Sheng [ 26/Jun/19 ] |
|
Hi, Campbell, I am testing the patch in our test cluster. Yes, I think it will be landed in next few months. You can setup it via cpu_pattern before that. Thanks, |