[LU-11391] soft lockup in ldlm_prepare_lru_list() Created: 18/Sep/18  Updated: 22/Nov/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Stephane Thiell Assignee: Yang Sheng
Resolution: Unresolved Votes: 0
Labels: None
Environment:

CentOS 7.5 patchfull and Lustre 2.11.55 on AMD EPYC servers


Attachments: Text File foreach_bt.txt     Text File vmcore-dmesg.txt    
Issue Links:
Duplicate
is duplicated by LU-11693 Soft lockups on Lustre clients Open
Related
is related to LU-9230 soft lockup on v2.9 Lustre clients (l... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Testing master branch, tag 2.11.55, and hit soft lockups in ldlm_prepare_lru_list() (workqueue: ldlm_pools_recalc_task) on the client when running mdtest from the IO-500 benchmark using a single client.

[212288.213417] NMI watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [kworker/35:1:600]
[212288.221336] Modules linked in: mgc(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase rpcsec_gss_krb5 dell_rbu auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_ucm rpcrdma rdma_ucm ib_uverbs ib_iser ib_umad rdma_cm iw_cm libiscsi ib_ipoib scsi_transport_iscsi ib_cm mlx5_ib ib_core sunrpc vfat fat amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg dm_multipath ccp dm_mod pcspkr shpchp i2c_piix4 ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
[212288.294305]  fb_sys_fops ttm mlx5_core crct10dif_pclmul drm ahci mlxfw crct10dif_common tg3 libahci crc32c_intel devlink megaraid_sas ptp libata i2c_core pps_core
[212288.307953] CPU: 35 PID: 600 Comm: kworker/35:1 Kdump: loaded Tainted: G           OEL ------------   3.10.0-862.9.1.el7_lustre.x86_64 #1
[212288.320378] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.3.6 04/20/2018
[212288.328069] Workqueue: events ldlm_pools_recalc_task [ptlrpc]
[212288.333925] task: ffff908bfc470000 ti: ffff908bfc464000 task.ti: ffff908bfc464000
[212288.341491] RIP: 0010:[<ffffffff9fd08ff2>]  [<ffffffff9fd08ff2>] native_queued_spin_lock_slowpath+0x122/0x200
[212288.351518] RSP: 0018:ffff908bfc467be8  EFLAGS: 00000246
[212288.356918] RAX: 0000000000000000 RBX: 0000000000002000 RCX: 0000000001190000
[212288.364139] RDX: ffff906bffb99740 RSI: 0000000001b10000 RDI: ffff90abfa32953c
[212288.371358] RBP: ffff908bfc467be8 R08: ffff908bffb19740 R09: 0000000000000000
[212288.378577] R10: 0000fbd0948bcb20 R11: 7fffffffffffffff R12: ffff907ca99fd018
[212288.385796] R13: 0000000000000000 R14: 0000000000018b40 R15: 0000000000018b40
[212288.393017] FS:  00007f96c2fbb740(0000) GS:ffff908bffb00000(0000) knlGS:0000000000000000
[212288.401190] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[212288.407023] CR2: 00007f8de6b4da88 CR3: 0000000aaf40e000 CR4: 00000000003407e0
[212288.414244] Call Trace:
[212288.416793]  [<ffffffffa0309510>] queued_spin_lock_slowpath+0xb/0xf
[212288.423151]  [<ffffffffa0316840>] _raw_spin_lock+0x20/0x30
[212288.428755]  [<ffffffffc0fb9280>] ldlm_pool_set_clv+0x20/0x40 [ptlrpc]
[212288.435391]  [<ffffffffc0f9c956>] ldlm_cancel_lrur_policy+0xd6/0x100 [ptlrpc]
[212288.442639]  [<ffffffffc0f9e4ca>] ldlm_prepare_lru_list+0x1fa/0x4c0 [ptlrpc]
[212288.449797]  [<ffffffffc0f9c880>] ? ldlm_iter_helper+0x20/0x20 [ptlrpc]
[212288.456522]  [<ffffffffc0fa3e31>] ldlm_cancel_lru+0x61/0x170 [ptlrpc]
[212288.463076]  [<ffffffffc0fb7741>] ldlm_cli_pool_recalc+0x231/0x240 [ptlrpc]
[212288.470148]  [<ffffffffc0fb785c>] ldlm_pool_recalc+0x10c/0x1f0 [ptlrpc]
[212288.476874]  [<ffffffffc0fb7abc>] ldlm_pools_recalc_delay+0x17c/0x1d0 [ptlrpc]
[212288.484208]  [<ffffffffc0fb7cd3>] ldlm_pools_recalc_task+0x1c3/0x260 [ptlrpc]
[212288.491431]  [<ffffffff9fcb35ef>] process_one_work+0x17f/0x440
[212288.497356]  [<ffffffff9fcb4686>] worker_thread+0x126/0x3c0
[212288.503016]  [<ffffffff9fcb4560>] ? manage_workers.isra.24+0x2a0/0x2a0
[212288.509629]  [<ffffffff9fcbb621>] kthread+0xd1/0xe0
[212288.514594]  [<ffffffff9fcbb550>] ? insert_kthread_work+0x40/0x40
[212288.520776]  [<ffffffffa03205e4>] ret_from_fork_nospec_begin+0xe/0x21
[212288.527300]  [<ffffffff9fcbb550>] ? insert_kthread_work+0x40/0x40
[212288.533479] Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 40 97 01 00 48 03 14 c5 a0 53 93 a0 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 

Triggered a crash dump that can be made available if anyone interested, just let me know. Attaching vmcore-dmest.txt and the output of foreach bt.

Client was running the following part of the IO-500 benchmark:

[Starting] mdtest_easy_stat
[Exec] mpirun -np 24 /home/sthiell/io-500-dev/bin/mdtest -T -F -d /firbench/nodom/datafiles/io500.2018.09.17-19.30.06/mdt_easy -n 200000 -u -L -x /firbench/nodom/datafiles/io500.2018.09.17-19.30.06/mdt_easy-stonewall

 
Best,
Stephane



 Comments   
Comment by Yang Sheng [ 18/Sep/18 ]

Hi, Stephane,

Could you please upload the vmcore to our ftp site(ftp.whamcloud.com)? Better pack with debuginfo rpm.

Thanks,
YangSheng

Comment by Stephane Thiell [ 18/Sep/18 ]

Hi YangSheng,

Done, uploaded as LU11391-vmcore-pack.tar with debuginfo rpms included. Hope that helps!

Best,
Stephane

Comment by Andreas Dilger [ 11/Oct/18 ]

Stephane, could you please try setting the LDLM LRU size to avoid the LRU getting too large:

client$ lctl set_param ldlm.namespaces.*.lru_size=50000

This might avoid the lockup that you are seeing. We are looking at making this the default for an upcoming release, since it seems to be a common problem.

Comment by Yang Sheng [ 12/Oct/18 ]

Hi, Stephane,

I have investigated the vmcore. Looks like we lost the timing of lockup. From stack trace you attached, the thread was spinning on pl_lock. Looks like not one can hold this lock for a long time except on server side. But this instance is client. Anyway, i'll try to reproduce it on my side.

Thanks,
YangSheng

Comment by Johann Peyrard (Inactive) [ 20/Nov/18 ]

We had the same issue  last week.

The only way to reduce this NMI message to near silent was to play with these two parameters : 

$ lctl set_param ldlm.namespaces.*.lru_size=10000

$ lctl set_param ldlm...lru_max_age=1000

 

Regards,

Johann

Generated at Sat Feb 10 02:43:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.