Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for hongchao.zhang <hongchao@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/f5d53b98-50a4-4c4a-aaf1-2af5e066916c
test_0b failed with the following error:
onyx-99vm1 crashed during replay-dual test_0b
Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/107851 - 4.18.0-513.24.1.el8_9.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/107851 - 4.18.0-513.24.1.el8_lustre.x86_64
<<Please provide additional information about the failure here>>
[ 1357.186785] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [tgt_recover_0:38322]
[ 1357.188475] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr i2c_piix4 virtio_balloon ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net serio_raw net_failover virtio_blk failover [last unloaded: obdecho]
[ 1357.199361] CPU: 0 PID: 38322 Comm: tgt_recover_0 Kdump: loaded Tainted: P OE --------- - - 4.18.0-513.24.1.el8_lustre.x86_64 #1
[ 1357.201784] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1357.202927] RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
[ 1357.204222] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 be d6 08 d0 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
[ 1357.207751] RSP: 0018:ffffab88859dfdf8 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
[ 1357.209240] RAX: ffffab88836d0008 RBX: 0000000000000000 RCX: 000000000000000e
[ 1357.210637] RDX: ffffab888369f000 RSI: ffffab88859dfe30 RDI: ffff8964fb5def00
[ 1357.212022] RBP: ffffffffc11d5440 R08: 0000000000000e20 R09: 000000000000000e
[ 1357.213413] R10: ffff8965039d3000 R11: ffffab88859dfb20 R12: 0000000000000000
[ 1357.214803] R13: ffff896515203978 R14: ffff8964fb5def00 R15: 0000000000000000
[ 1357.216199] FS: 0000000000000000(0000) GS:ffff89657fc00000(0000) knlGS:0000000000000000
[ 1357.217778] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1357.218912] CR2: 00007f9f31db50c0 CR3: 000000000b610002 CR4: 00000000001706f0
[ 1357.220311] Call Trace:
[ 1357.220881] <IRQ>
[ 1357.221361] ? watchdog_timer_fn.cold.10+0x46/0x9e
[ 1357.222377] ? watchdog+0x30/0x30
[ 1357.223074] ? __hrtimer_run_queues+0x101/0x280
[ 1357.224015] ? hrtimer_interrupt+0x100/0x220
[ 1357.224886] ? smp_apic_timer_interrupt+0x6a/0x130
[ 1357.225907] ? apic_timer_interrupt+0xf/0x20
[ 1357.226780] </IRQ>
[ 1357.227255] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc]
[ 1357.228945] ? cfs_hash_for_each_relax+0x17b/0x480 [libcfs]
[ 1357.230059] ? cfs_hash_for_each_relax+0x172/0x480 [libcfs]
[ 1357.231178] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc]
[ 1357.232397] ? ldlm_lock_mode_downgrade+0x2f0/0x2f0 [ptlrpc]
[ 1357.233621] cfs_hash_for_each_nolock+0x124/0x200 [libcfs]
[ 1357.234720] ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc]
[ 1357.235974] target_recovery_thread+0xe0f/0x1360 [ptlrpc]
[ 1357.237153] ? replay_request_or_update.isra.30+0xa50/0xa50 [ptlrpc]
[ 1357.238494] kthread+0x134/0x150
[ 1357.239211] ? set_kthread_struct+0x50/0x50
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
replay-dual test_0b - onyx-99vm1 crashed during replay-dual test_0b
Attachments
Issue Links
- duplicates
-
LU-18031 soft lockup in cfs_hash_for_each_relax
- Open