[LU-11890] soft lockups in ldiskfs_es_lru_add Created: 26/Jan/19 Updated: 16/Jan/22 Resolved: 16/Jan/22 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Stephane Thiell | Assignee: | Mikhail Pershin |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 7.6, ldiskfs |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
The following backtraces can be seen on a new MDS with 2.12.0 (with DNE, PFL and DoM enabled): [717785.597502] Lustre: fir-MDT0000: Client 85c45b71-10aa-be9f-071b-9a9c50548db2 (at 10.8.15.7@o2ib6) reconnecting [717785.607591] Lustre: Skipped 1 previous similar message [717785.612864] Lustre: fir-MDT0000: Connection restored to ff1032ee-a0fe-1065-6b12-2d8b61fa9745 (at 10.8.15.7@o2ib6) [717785.623208] Lustre: Skipped 1 previous similar message [717786.125444] Lustre: 5936:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1548496141/real 0] req@ffff911f54e31e00 x1623664975829664/t0(0) o13->fir-OST0000-osc-MDT0000@10.0.10.101@o2ib7:7/4 lens 224/368 e 0 to 1 dl 1548496148 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 [717786.152781] Lustre: 5936:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages [717786.162526] Lustre: fir-OST0000-osc-MDT0000: Connection to fir-OST0000 (at 10.0.10.101@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [717786.178676] Lustre: Skipped 7 previous similar messages [717787.122451] NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [mdt_io01_012:7223] [717787.157454] NMI watchdog: BUG: soft lockup - CPU#13 stuck for 22s! [mdt_io01_014:8952] [717787.130365] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd [717787.157456] Modules linked in: [717787.157457] osp(OE) [717787.157458] mdd(OE) [717787.157459] lod(OE) [717787.157460] mdt(OE) [717787.157460] lfsck(OE) [717787.157461] mgs(OE) [717787.157461] mgc(OE) [717787.157462] osd_ldiskfs(OE) [717787.157462] lquota(OE) [717787.157463] ldiskfs(OE) [717787.157463] lustre(OE) [717787.157464] lmv(OE) [717787.157465] mdc(OE) [717787.157465] osc(OE) [717787.157466] lov(OE) [717787.157468] fid(OE) [717787.157469] fld(OE) [717787.157469] ko2iblnd(OE) [717787.157470] ptlrpc(OE) [717787.157470] obdclass(OE) [717787.157471] lnet(OE) [717787.157472] libcfs(OE) [717787.157472] rpcsec_gss_krb5 [717787.157473] auth_rpcgss [717787.157474] nfsv4 [717787.157474] dns_resolver [717787.157475] nfs [717787.157476] lockd [717787.157476] grace [717787.157477] fscache [717787.157477] rdma_ucm(OE) [717787.157478] ib_ucm(OE) [717787.157479] rdma_cm(OE) [717787.157479] iw_cm(OE) [717787.157480] ib_ipoib(OE) [717787.157481] ib_cm(OE) [717787.157481] ib_umad(OE) [717787.157482] mlx5_fpga_tools(OE) [717787.157483] mlx4_en(OE) [717787.157483] mlx4_ib(OE) [717787.157484] mlx4_core(OE) [717787.157485] dell_rbu [717787.157485] sunrpc [717787.157486] vfat [717787.157487] fat [717787.157487] dm_round_robin [717787.157488] dcdbas [717787.157489] amd64_edac_mod [717787.157489] edac_mce_amd [717787.157490] kvm_amd [717787.157490] kvm [717787.157491] irqbypass [717787.157492] crc32_pclmul [717787.157492] ghash_clmulni_intel [717787.157493] aesni_intel [717787.157493] lrw [717787.157494] gf128mul [717787.157495] glue_helper [717787.157495] ablk_helper [717787.157496] cryptd [717787.157496] ses [717787.157497] enclosure [717787.157497] dm_multipath [717787.157498] pcspkr [717787.157498] ipmi_si [717787.157499] ipmi_devintf [717787.157499] dm_mod [717787.157500] ccp [717787.157501] k10temp [717787.157501] ipmi_msghandler [717787.157502] sg [717787.157502] i2c_piix4 [717787.157503] acpi_power_meter [717787.157504] knem(OE) [717787.157504] ip_tables [717787.157505] ext4 [717787.157506] mbcache [717787.157506] jbd2 [717787.157507] sd_mod [717787.157507] crc_t10dif [717787.157507] crct10dif_generic [717787.157508] mlx5_ib(OE) [717787.157509] ib_uverbs(OE) [717787.157510] ib_core(OE) [717787.157510] i2c_algo_bit [717787.157511] drm_kms_helper [717787.157512] ahci [717787.157512] mlx5_core(OE) [717787.157513] syscopyarea [717787.157514] sysfillrect [717787.157514] sysimgblt [717787.157515] fb_sys_fops [717787.157516] mlxfw(OE) [717787.157516] libahci [717787.157517] devlink [717787.157517] ttm [717787.157518] crct10dif_pclmul [717787.157519] crct10dif_common [717787.157519] mlx_compat(OE) [717787.157520] tg3 [717787.157521] drm [717787.157521] crc32c_intel [717787.157522] libata [717787.157522] ptp [717787.157523] drm_panel_orientation_quirks [717787.157523] megaraid_sas [717787.157524] pps_core [717787.157525] mpt3sas(OE) [717787.157525] raid_class [717787.157526] scsi_transport_sas [717787.157526] [last unloaded: libcfs] [717787.157530] CPU: 13 PID: 8952 Comm: mdt_io01_014 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 [717787.157531] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 [717787.157533] task: ffff91599d3d1040 ti: ffff9159f42a4000 task.ti: ffff9159f42a4000 [717787.157534] RIP: 0010:[<ffffffffadd1228e>] [717787.157541] [<ffffffffadd1228e>] native_queued_spin_lock_slowpath+0x1ce/0x200 [717787.157542] RSP: 0018:ffff9159f42a7798 EFLAGS: 00000202 [717787.157543] RAX: 0000000000000001 RBX: ffff911e33293a30 RCX: 0000000000000001 [717787.157544] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff9149fce5fc80 [717787.157545] RBP: ffff9159f42a7798 R08: 0000000000000101 R09: ffffffffc12b0d1a [717787.157546] R10: ffff9139ff6df140 R11: ffffdbb24f22f600 R12: ffff9159f42a7738 [717787.157546] R13: ffff911e33293b18 R14: 0000000000000000 R15: 1fffffffffffffff [717787.157548] FS: 00007f277afbf700(0000) GS:ffff9139ff6c0000(0000) knlGS:0000000000000000 [717787.157549] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [717787.157550] CR2: 00007f2781327000 CR3: 000000402c810000 CR4: 00000000003407e0 [717787.157551] Call Trace: [717787.157556] [<ffffffffae35bfcb>] queued_spin_lock_slowpath+0xb/0xf [717787.157560] [<ffffffffae36a480>] _raw_spin_lock+0x20/0x30 [717787.157574] [<ffffffffc129d507>] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] [717787.157586] [<ffffffffc12b0d35>] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] [717787.157593] [<ffffffffc0b34ae2>] ? kiblnd_launch_tx+0xd2/0xc20 [ko2iblnd] [717787.157599] [<ffffffffadf210f2>] ? crypto_create_tfm+0x32/0xd0 [717787.157609] [<ffffffffc1336181>] ? qsd_op_begin+0xb1/0x4b0 [lquota] [717787.157618] [<ffffffffc126c518>] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] [717787.157625] [<ffffffffc1336181>] ? qsd_op_begin+0xb1/0x4b0 [lquota] [717787.157633] [<ffffffffc126ab85>] ? ldiskfs_inode_attach_jinode+0x55/0xd0 [ldiskfs] [717787.157647] [<ffffffffc13a2443>] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] [717787.157657] [<ffffffffc13a3b32>] osd_write_commit+0x3a2/0x8c0 [osd_ldiskfs] [717787.157667] [<ffffffffc12922c9>] ? __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs] [717787.157689] [<ffffffffc1552b93>] mdt_obd_commitrw+0x953/0x1150 [mdt] [717787.157742] [<ffffffffc0ed0e2c>] obd_commitrw+0x9c/0x370 [ptlrpc] [717787.157784] [<ffffffffc0ed3a9d>] tgt_brw_write+0x100d/0x1a90 [ptlrpc] [717787.157821] [<ffffffffc0e6d2e0>] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] [717787.157828] [<ffffffffadce011c>] ? update_curr+0x14c/0x1e0 [717787.157830] [<ffffffffadcdca7e>] ? account_entity_dequeue+0xae/0xd0 [717787.157862] [<ffffffffc0e28a30>] ? target_send_reply_msg+0x170/0x170 [ptlrpc] [717787.157903] [<ffffffffc0ed735a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [717787.157943] [<ffffffffc0eb0a51>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [717787.157951] [<ffffffffc0abcbde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [717787.157989] [<ffffffffc0e7b92b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [717787.158026] [<ffffffffc0e787b5>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [717787.158029] [<ffffffffadcd67c2>] ? default_wake_function+0x12/0x20 [717787.158032] [<ffffffffadccba9b>] ? __wake_up_common+0x5b/0x90 [717787.158070] [<ffffffffc0e7f25c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [717787.158106] [<ffffffffc0e7e760>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [717787.158110] [<ffffffffadcc1c31>] kthread+0xd1/0xe0 [717787.158112] [<ffffffffadcc1b60>] ? insert_kthread_work+0x40/0x40 [717787.158115] [<ffffffffae374c24>] ret_from_fork_nospec_begin+0xe/0x21 [717787.158117] [<ffffffffadcc1b60>] ? insert_kthread_work+0x40/0x40 [717787.158117] Code: [717787.158118] 37 [717787.158118] 81 [717787.158119] fe [717787.158119] 00 [717787.158120] 01 [717787.158120] 00 [717787.158120] 00 [717787.158120] 74 [717787.158121] f4 [717787.158121] e9 [717787.158121] 93 [717787.158122] fe [717787.158122] ff [717787.158122] ff [717787.158123] 0f [717787.158123] 1f [717787.158123] 80 [717787.158124] 00 [717787.158124] 00 [717787.158124] 00 [717787.158125] 00 [717787.158125] 83 [717787.158125] fa [717787.158126] 01 [717787.158126] 75 [717787.158126] 11 [717787.158127] 0f [717787.158127] 1f [717787.158127] 00 [717787.158128] e9 [717787.158128] 68 [717787.158128] fe [717787.158129] ff [717787.158129] ff [717787.158129] 0f [717787.158129] 1f [717787.158130] 00 [717787.158130] 85 [717787.158130] c0 [717787.158131] 74 [717787.158131] 0c [717787.158131] f3 [717787.158132] 90 [717787.158132] <8b> [717787.158133] 07 [717787.158133] 0f [717787.158133] b6 [717787.158134] c0 [717787.158134] 83 [717787.158134] f8 [717787.158135] 03 [717787.158135] 75 [717787.158135] f0 [717787.158135] b8 [717787.158136] 01 [717787.158136] 00 [717787.158136] 00 [717787.158137] 00 [717787.158137] 66 [717787.158137] 89 [717787.158138] 07 [717787.158138] 5d [717787.158138] c3 [717787.158139] 66 [717787.165452] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [mdt_io02_012:7222] etc. ldiskfs config:
[root@fir-md1-s1 ~]# dumpe2fs -h /dev/mapper/md1-rbod1-mdt1
dumpe2fs 1.44.3.wc1 (23-July-2018)
Filesystem volume name: fir-MDT0001
Last mounted on: /
Filesystem UUID: ab333144-bee5-4e2a-b4bb-aa55c29b1718
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr dir_index filetype needs_recovery extent 64bit mmp flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink quota
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 288005760
Block count: 4681213440
Reserved block count: 234060672
Free blocks: 4607819718
Free inodes: 288004250
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 2016
Inode blocks per group: 504
Flex block group size: 16
Filesystem created: Thu Jan 24 14:02:56 2019
Last mount time: Fri Jan 25 11:45:51 2019
Last write time: Fri Jan 25 11:45:51 2019
Mount count: 4
Maximum mount count: -1
Last checked: Thu Jan 24 14:02:56 2019
Check interval: 0 (<none>)
Lifetime writes: 275 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 1024
Required extra isize: 32
Desired extra isize: 32
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: df4e92b3-3ca4-4642-bcf8-333ebb00b894
Journal backup: inode blocks
MMP block number: 10335
MMP update interval: 5
User quota inode: 3
Group quota inode: 4
Journal features: journal_incompat_revoke journal_64bit
Journal size: 4096M
Journal length: 1048576
Journal sequence: 0x00289850
Journal start: 316606
MMP_block:
mmp_magic: 0x4d4d50
mmp_check_interval: 10
mmp_sequence: 0x004078
mmp_update_date: Sat Jan 26 10:41:07 2019
mmp_update_time: 1548528067
mmp_node_name: fir-md1-s2
mmp_device_name: dm-2
The impact seems to have been temporary as right now it's working fine again. We're running some large fpsync to synchronize datasets from an old lustre filesystem, so this is mostly writes. Stephane |
| Comments |
| Comment by Peter Jones [ 27/Jan/19 ] |
|
Mike Can you please investigate? Thanks Peter |
| Comment by Mikhail Pershin [ 02/Feb/19 ] |
|
was Lustre build from sources or our RPMs were used? Also could you provide exact kernel version? |
| Comment by Stephane Thiell [ 04/Feb/19 ] |
|
Lustre was build from source using 2.12 patch series
Kernel version is 3.10.0-957.1.3.el7_lustre.x86_64 Thx, Stephane |
| Comment by Peter Jones [ 23/Mar/19 ] |
|
Stephane Are you still seeing this? Peter |
| Comment by Stephane Thiell [ 24/Mar/19 ] |
|
Hi Peter, No, I think we hit that issue only once when we used fpsync (parallel rsync), it was just before putting the filesystem into production to sync some files. We don't do that anymore, and it hasn't occurred in production so far. |
| Comment by Peter Jones [ 24/Mar/19 ] |
|
ok - thanks |
| Comment by Stephane Thiell [ 02/May/19 ] |
|
We noticed that tonight again on Fir MDS fir-md1-s1 (serving fir-MDT0000 and MDT0002). May 01 23:42:36 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [mdt_io00_057:103101] May 01 23:42:36 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [mdt_io01_029:102923] ... May 01 23:42:36 fir-md1-s1 kernel: Call Trace: May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffb755bfcb>] queued_spin_lock_slowpath+0xb/0xf May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffb756a480>] _raw_spin_lock+0x20/0x30 May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc121e507>] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc1231d35>] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffb6f022f2>] ? ktime_get+0x52/0xe0 May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc0ab2632>] ? kiblnd_check_sends_locked+0xa72/0xe40 [ko2iblnd] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc11ed518>] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffb6f01052>] ? ktime_get_ts64+0x52/0xf0 May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc1323443>] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc1325306>] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc14d33d7>] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc0e51aee>] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc0deb6c0>] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc0deb6d7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffb6ee011c>] ? update_curr+0x14c/0x1e0 May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffb6edca7e>] ? account_entity_dequeue+0xae/0xd0 May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffb6edb738>] ? __enqueue_entity+0x78/0x80 May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc0e437ad>] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc0e5573a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc0e2ee31>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc0a3abde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc0df9d0b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc0df6b95>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffb6ed67c2>] ? default_wake_function+0x12/0x20 May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffb6ecba9b>] ? __wake_up_common+0x5b/0x90 May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc0dfd63c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffc0dfcb40>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffb6ec1c31>] kthread+0xd1/0xe0 May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffb6ec1b60>] ? insert_kthread_work+0x40/0x40 May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffb7574c24>] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:42:36 fir-md1-s1 kernel: [<ffffffffb6ec1b60>] ? insert_kthread_work+0x40/0x40 then we can see many other call traces After that, it looks like some requests timed out, but then the MDS recovered by itself. Attaching kernel logs since last MDT start on [root@fir-rbh01 ~]# lfs df UUID 1K-blocks Used Available Use% Mounted on fir-MDT0000_UUID 18287292984 1755730844 15595268472 10% /fir[MDT:0] fir-MDT0001_UUID 18287292984 2648570156 14702311140 15% /fir[MDT:1] fir-MDT0002_UUID 18287292984 4061224344 13289761996 23% /fir[MDT:2] fir-MDT0003_UUID 18287292984 2629575076 14721428952 15% /fir[MDT:3] ... [root@fir-rbh01 ~]# lfs df -i UUID Inodes IUsed IFree IUse% Mounted on fir-MDT0000_UUID 288005760 55021564 232984196 10% /fir[MDT:0] fir-MDT0001_UUID 288005760 71435196 216570564 15% /fir[MDT:1] fir-MDT0002_UUID 288005760 135215271 152790489 23% /fir[MDT:2] fir-MDT0003_UUID 288005760 61789653 226216107 15% /fir[MDT:3] ... Note: this happened about an hour after we decided to start Robinhood again, it's just reading changelogs and doing its thing (stat()...). MDT storage is SSD-based. |
| Comment by Mikhail Pershin [ 02/May/19 ] |
|
So it still exists, I will check logs, thanks for reporting that |