[LU-17953] sanity test_60a: MDS crash during umount - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.15.5
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/375e5764-84b4-41ec-8cf2-b109216ea063

test_60a failed with the following error:

onyx-99vm1 crashed during sanity test_60a

Test session details:
clients: https://build.whamcloud.com/job/lustre-b2_15/88 - 5.14.0-427.18.1.el9_4.x86_64
servers: https://build.whamcloud.com/job/lustre-b2_15/88 - 4.18.0-513.24.1.el8_lustre.x86_64

<<Please provide additional information about the failure here>>

[ 4362.533695] LustreError: 171557:0:(libcfs_fail.h:169:cfs_race()) cfs_race id 1317 sleeping
[ 4363.083782] LustreError: 171577:0:(libcfs_fail.h:180:cfs_race()) cfs_fail_race id 1317 waking
[ 4363.085558] LustreError: 171557:0:(libcfs_fail.h:178:cfs_race()) cfs_fail_race id 1317 awake: rc=4450
[ 4364.107775] LustreError: 171577:0:(libcfs_fail.h:180:cfs_race()) cfs_fail_race id 1317 waking
[ 4364.109524] Lustre: 171577:0:(llog_test.c:1470:cat_check_old_cb()) seeing record at index 3 - [0x1:0x4be:0x0] in log [0xa:0x11:0x0]
[ 4364.688147] LustreError: 171557:0:(libcfs_fail.h:180:cfs_race()) cfs_fail_race id 1317 waking
[ 4365.708742] Lustre: 171557:0:(llog_test.c:2075:llog_test_10()) 10h: wrote 64767 records then 0 failed with ENOSPC
[ 4365.710822] Lustre: 171557:0:(llog_test.c:2088:llog_test_10()) 10: put newly-created catalog
[ 4366.910031] Lustre: DEBUG MARKER: /usr/sbin/lctl dk
[ 4368.148192] Lustre: DEBUG MARKER: which llog_reader 2> /dev/null
[ 4368.502516] Lustre: DEBUG MARKER: ls -d /usr/sbin/llog_reader
[ 4369.709382] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
[ 4370.032161] Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1
[ 4372.532113] Lustre: Failing over lustre-MDT0000
[ 4375.225434] Lustre: lustre-MDT0000: Not available for connect from 10.240.29.251@tcp (stopping)
[ 4378.342533] Lustre: lustre-MDT0000: Not available for connect from 10.240.29.252@tcp (stopping)
[ 4387.945411] Lustre: 11292:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1718047804/real 1718047804]  req@000000009650aa81 x1801499268665600/t0(0) o400->lustre-OST0004-osc-MDT0000@10.240.29.253@tcp:28/4 lens 224/224 e 0 to 1 dl 1718047811 ref 1 fl Rpc:RXNQ/0/ffffffff rc 0/-1 job:'kworker/u4:0.0'
[ 4387.955128] Lustre: lustre-OST0004-osc-MDT0000: Connection to lustre-OST0004 (at 10.240.29.253@tcp) was lost; in progress operations using this service will wait for recovery to complete
[ 4387.958383] Lustre: Skipped 2 previous similar messages
[ 4387.959857] Lustre: lustre-MDT0000: Not available for connect from 10.240.29.251@tcp (stopping)
[ 4387.961600] Lustre: Skipped 15 previous similar messages
[ 4388.020779] Lustre: lustre-OST0004-osc-MDT0000: Connection restored to 10.240.29.253@tcp (at 10.240.29.253@tcp)
[ 4388.271300] Lustre: MGS: Client 6a7488a2-5141-4026-8a48-15608255ca8e (at 10.240.29.251@tcp) reconnecting
[ 4389.853159] LustreError: 172062:0:(lprocfs_jobstats.c:137:job_stat_exit()) should not have any items
[ 4389.855119] LustreError: 172062:0:(lprocfs_jobstats.c:137:job_stat_exit()) Skipped 6 previous similar messages
[ 4390.265347] LustreError: 172069:0:(client.c:1256:ptlrpc_import_delay_req()) @@@ IMP_CLOSED  req@000000007642e2d0 x1801499268667136/t0(0) o101->lustre-MDT0000-lwp-MDT0000@0@lo:23/10 lens 456/496 e 0 to 0 dl 0 ref 2 fl Rpc:QU/0/ffffffff rc 0/-1 job:'qsd_reint_0.lus.0'
[ 4390.270035] LustreError: 172069:0:(qsd_reint.c:56:qsd_reint_completion()) lustre-MDT0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x10000:0x0], rc:-5
[ 4392.020862] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.240.29.251@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[ 4392.024378] LustreError: Skipped 8 previous similar messages
[ 4393.271120] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.240.29.253@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[ 4397.062969] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.240.29.251@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[ 4397.066497] LustreError: Skipped 7 previous similar messages
[ 4398.474282] Lustre: 172062:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1718047819/real 1718047819]  req@000000007bbd2132 x1801499268667584/t0(0) o251->MGC10.240.28.44@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1718047825 ref 2 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'umount.0'
[ 4398.479856] Lustre: 172062:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 1 previous similar message
[ 4427.970012] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [umount:172062]
[ 4427.971533] Modules linked in: obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr virtio_balloon i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel serio_raw virtio_net net_failover virtio_blk failover [last unloaded: llog_test]
[ 4427.996085] CPU: 0 PID: 172062 Comm: umount Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-513.24.1.el8_lustre.x86_64 #1
[ 4427.998467] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 4427.999615] RIP: 0010:memset_erms+0x9/0x20
[ 4428.147666] Code: 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66
[ 4428.151228] RSP: 0018:ffffa9f9c133bb30 EFLAGS: 00010246 ORIG_RAX: ffffffffffffff13
[ 4428.152710] RAX: ffff96b543ea525a RBX: ffff96b580f3fc00 RCX: 0000000000100000
[ 4428.154121] RDX: 0000000000100000 RSI: 000000000000005a RDI: ffff96b579100000
[ 4428.155527] RBP: ffff96b580f3fc28 R08: 0000000000000001 R09: ffff96b579100000
[ 4428.156937] R10: ffff96b5787d5000 R11: 0000000000000001 R12: ffff96b580f3fc00
[ 4428.158342] R13: ffff96b5787d5800 R14: dead000000000200 R15: dead000000000100
[ 4428.171578] FS:  00007fc8771da080(0000) GS:ffff96b5ffc00000(0000) knlGS:0000000000000000
[ 4428.173156] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4428.174301] CR2: 00007f6943409050 CR3: 000000001bc80001 CR4: 00000000001706f0
[ 4428.175710] Call Trace:
[ 4428.252421]  <IRQ>
[ 4428.254032]  ? watchdog_timer_fn.cold.10+0x46/0x9e
[ 4428.278435]  ? watchdog+0x30/0x30
[ 4428.279153]  ? __hrtimer_run_queues+0x101/0x280
[ 4428.280620]  ? hrtimer_interrupt+0x100/0x220
[ 4428.281501]  ? smp_apic_timer_interrupt+0x6a/0x130
[ 4428.289409]  ? apic_timer_interrupt+0xf/0x20
[ 4428.290302]  </IRQ>
[ 4428.290773]  ? memset_erms+0x9/0x20
[ 4428.291510]  ptlrpc_service_purge_all+0x422/0xa80 [ptlrpc]
[ 4428.293182]  ptlrpc_unregister_service+0x422/0x940 [ptlrpc]
[ 4428.294389]  ? kmem_cache_alloc_trace+0x142/0x280
[ 4428.303771]  ? lprocfs_counter_add+0xd2/0x140 [obdclass]
[ 4428.305122]  mds_stop_ptlrpc_service+0x69/0x1b0 [mdt]
[ 4428.322721]  mds_device_fini+0x28/0xd0 [mdt]
[ 4428.323661]  class_cleanup+0x6f5/0xc90 [obdclass]
[ 4428.392742]  class_process_config+0x3ad/0x2080 [obdclass]
[ 4428.393908]  ? class_manual_cleanup+0x191/0x780 [obdclass]
[ 4428.395093]  ? __kmalloc+0x113/0x250
[ 4428.402142]  class_manual_cleanup+0x456/0x780 [obdclass]
[ 4428.403265]  server_put_super+0xc8b/0x1350 [obdclass]
[ 4428.404575]  ? evict_inodes+0x160/0x1b0
[ 4428.405889]  generic_shutdown_super+0x6c/0x110
[ 4428.433075]  kill_anon_super+0x14/0x30
[ 4428.433877]  deactivate_locked_super+0x34/0x70
[ 4428.446805]  cleanup_mnt+0x3b/0x70
[ 4428.448698]  task_work_run+0x8a/0xb0
[ 4428.485783]  exit_to_usermode_loop+0xef/0x100
[ 4428.542958]  do_syscall_64+0x19c/0x1b0
[ 4428.543778]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 4428.560793] RIP: 0033:0x7fc876144e9b
[ 4428.561572] Code: ff d0 48 89 c7 b8 3c 00 00 00 0f 05 48 8b 0d e4 4f 38 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd 4f 38 00 f7 d8 64 89 01 48
[ 4428.565120] RSP: 002b:00007fff58347e08 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 4428.566604] RAX: 0000000000000000 RBX: 0000564708b059c0 RCX: 00007fc876144e9b
[ 4428.568005] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000564708b0b390
[ 4428.569417] RBP: 0000000000000000 R08: 0000564708b0bf20 R09: 0000564708b00010
[ 4428.570820] R10: 0000000000000000 R11: 0000000000000246 R12: 0000564708b0b390
[ 4428.572230] R13: 00007fc876fb6184 R14: 0000564708b05ba0 R15: 00000000ffffffff
[ 4428.573821] Kernel panic - not syncing: softlockup: hung tasks
[ 4428.574998] CPU: 0 PID: 172062 Comm: umount Kdump: loaded Tainted: G           OEL   --------- -  - 4.18.0-513.24.1.el8_lustre.x86_64 #1
[ 4428.577378] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 4428.578518] Call Trace:
[ 4428.582866]  <IRQ>
[ 4428.583355]  dump_stack+0x41/0x60
[ 4428.604195]  panic+0xe7/0x2ac
[ 4428.606288]  watchdog_timer_fn.cold.10+0x85/0x9e
[ 4428.622294]  ? watchdog+0x30/0x30
[ 4428.623035]  __hrtimer_run_queues+0x101/0x280
[ 4428.635497]  hrtimer_interrupt+0x100/0x220
[ 4428.636677]  smp_apic_timer_interrupt+0x6a/0x130
[ 4428.650956]  apic_timer_interrupt+0xf/0x20
[ 4428.651834]  </IRQ>
[ 4428.652312] RIP: 0010:memset_erms+0x9/0x20
[ 4428.653162] Code: 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66
[ 4428.656714] RSP: 0018:ffffa9f9c133bb30 EFLAGS: 00010246 ORIG_RAX: ffffffffffffff13
[ 4428.658216] RAX: ffff96b543ea525a RBX: ffff96b580f3fc00 RCX: 0000000000100000
[ 4428.659631] RDX: 0000000000100000 RSI: 000000000000005a RDI: ffff96b579100000
[ 4428.661036] RBP: ffff96b580f3fc28 R08: 0000000000000001 R09: ffff96b579100000
[ 4428.677284] R10: ffff96b5787d5000 R11: 0000000000000001 R12: ffff96b580f3fc00
[ 4428.678708] R13: ffff96b5787d5800 R14: dead000000000200 R15: dead000000000100
[ 4428.680117]  ptlrpc_service_purge_all+0x422/0xa80 [ptlrpc]
[ 4428.681308]  ptlrpc_unregister_service+0x422/0x940 [ptlrpc]
[ 4428.682506]  ? kmem_cache_alloc_trace+0x142/0x280
[ 4428.683465]  ? lprocfs_counter_add+0xd2/0x140 [obdclass]
[ 4428.684603]  mds_stop_ptlrpc_service+0x69/0x1b0 [mdt]
[ 4428.685667]  mds_device_fini+0x28/0xd0 [mdt]
[ 4428.686591]  class_cleanup+0x6f5/0xc90 [obdclass]
[ 4428.687609]  class_process_config+0x3ad/0x2080 [obdclass]
[ 4428.688752]  ? class_manual_cleanup+0x191/0x780 [obdclass]
[ 4428.689910]  ? __kmalloc+0x113/0x250
[ 4428.690678]  class_manual_cleanup+0x456/0x780 [obdclass]
[ 4428.691816]  server_put_super+0xc8b/0x1350 [obdclass]
[ 4428.692893]  ? evict_inodes+0x160/0x1b0
[ 4428.693699]  generic_shutdown_super+0x6c/0x110
[ 4428.694611]  kill_anon_super+0x14/0x30
[ 4428.695401]  deactivate_locked_super+0x34/0x70
[ 4428.696311]  cleanup_mnt+0x3b/0x70
[ 4428.697034]  task_work_run+0x8a/0xb0
[ 4428.697791]  exit_to_usermode_loop+0xef/0x100
[ 4428.698688]  do_syscall_64+0x19c/0x1b0
[ 4428.699482]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 4428.700517] RIP: 0033:0x7fc876144e9b
[ 4428.701276] Code: ff d0 48 89 c7 b8 3c 00 00 00 0f 05 48 8b 0d e4 4f 38 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd 4f 38 00 f7 d8 64 89 01 48
[ 4428.704828] RSP: 002b:00007fff58347e08 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 4428.706314] RAX: 0000000000000000 RBX: 0000564708b059c0 RCX: 00007fc876144e9b
[ 4428.746088] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000564708b0b390
[ 4428.747498] RBP: 0000000000000000 R08: 0000564708b0bf20 R09: 0000564708b00010
[ 4428.748908] R10: 0000000000000000 R11: 0000000000000246 R12: 0000564708b0b390
[ 4428.750319] R13: 00007fc876fb6184 R14: 0000564708b05ba0 R15: 00000000ffffffff

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_60a - onyx-99vm1 crashed during sanity test_60a

Attachments

Issue Links

is related to

LU-17946 sanity test_818: watchdog: BUG: soft lockup - CPU#1 stuck for 22s! umount

Resolved

mentioned in: Page No Confluence page found with the given URL.; Page No Confluence page found with the given URL.; Page No Confluence page found with the given URL.; Page No Confluence page found with the given URL.; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(13 mentioned in)

Activity

[LU-17953] sanity test_60a: MDS crash during umount

Hongchao Zhang added a comment - 27/Apr/25 7:12 AM

+1 on master replay-single 117: https://testing.whamcloud.com/test_sets/5d4bd811-55c8-4f77-affb-afecaf6ba430

Hongchao Zhang added a comment - 27/Apr/25 7:12 AM +1 on master replay-single 117: https://testing.whamcloud.com/test_sets/5d4bd811-55c8-4f77-affb-afecaf6ba430

Sergey Cheremencev added a comment - 02/Apr/25 9:01 PM

+1 on master sanityn 42d: https://testing.whamcloud.com/test_sets/425f041f-c0f1-447b-8de9-94ca3889bb47

Sergey Cheremencev added a comment - 02/Apr/25 9:01 PM +1 on master sanityn 42d: https://testing.whamcloud.com/test_sets/425f041f-c0f1-447b-8de9-94ca3889bb47

Alexander Boyko added a comment - 10/Dec/24 1:11 PM

Similar call trace with panic but different test sanity 133f
https://testing.whamcloud.com/test_sets/a0d5b200-1f08-4e54-8b9b-d0c8bffef1cf

[ 9765.923354] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [umount:372128]
[ 9765.937614] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon i2c_piix4 joydev pcspkr ext4 mbcache jbd2 ata_generic ata_piix libata virtio_net crc32c_intel serio_raw net_failover virtio_blk failover [last unloaded: llog_test]
[ 9765.948832] CPU: 0 PID: 372128 Comm: umount Kdump: loaded Tainted: P           OE     -------- -  - 4.18.0-553.27.1.el8_lustre.x86_64 #1
[ 9765.951355] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 9765.952521] RIP: 0010:memset_erms+0x9/0x20
[ 9765.953476] Code: 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 f9 40 88 f0 48 89 d1  aa 4c 89 c8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66
[ 9765.957207] RSP: 0018:ffffb889c163bad8 EFLAGS: 00010246 ORIG_RAX: ffffffffffffff13
[ 9765.958707] RAX: ffff9d1369d9885a RBX: ffff9d1361ff4c00 RCX: 00000000000ed000
[ 9765.960137] RDX: 0000000000100000 RSI: 000000000000005a RDI: ffff9d136b513000
[ 9765.961516] RBP: ffff9d1361ff4c28 R08: 000000000000083c R09: ffff9d136b500000
[ 9765.963000] R10: ffff9d1382fb6000 R11: ffff9d1382fb583a R12: dead000000000200
[ 9765.964496] R13: dead000000000100 R14: ffff9d1361ff6c00 R15: ffff9d1361ff4c00
[ 9765.965997] FS:  00007fe3d6abb080(0000) GS:ffff9d13fcc00000(0000) knlGS:0000000000000000
[ 9765.967763] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9765.969010] CR2: 000055e5541d90ac CR3: 0000000042a6c004 CR4: 00000000001706f0
[ 9765.970408] Call Trace:
[ 9765.991128]  
[ 9766.012282]  ? watchdog_timer_fn.cold.10+0x46/0x9e
[ 9766.013299]  ? watchdog+0x30/0x30
[ 9766.014011]  ? __hrtimer_run_queues+0x101/0x280
[ 9766.014948]  ? hrtimer_interrupt+0x100/0x220
[ 9766.015812]  ? smp_apic_timer_interrupt+0x6a/0x130
[ 9766.016796]  ? apic_timer_interrupt+0xf/0x20
[ 9766.017669]  
[ 9766.018158]  ? memset_erms+0x9/0x20
[ 9766.018928]  ptlrpc_service_purge_all+0x460/0xa50 [ptlrpc]
[ 9766.217832]  ptlrpc_unregister_service+0x420/0x9e0 [ptlrpc]
[ 9766.219132]  mds_stop_ptlrpc_service+0x69/0x1b0 [mdt]
[ 9766.350856]  mds_device_fini+0x28/0xd0 [mdt]
[ 9766.351783]  obd_precleanup+0x1e5/0x220 [obdclass]
[ 9766.467912]  class_cleanup+0x31e/0x900 [obdclass]
[ 9766.468932]  class_process_config+0x3b8/0x21c0 [obdclass]
[ 9766.470057]  class_manual_cleanup+0x459/0x780 [obdclass]
[ 9766.471162]  server_put_super+0xbac/0x1110 [ptlrpc]
[ 9766.502236]  generic_shutdown_super+0x6c/0x110
[ 9766.503195]  kill_anon_super+0x14/0x30
[ 9766.503968]  deactivate_locked_super+0x34/0x70
[ 9766.504864]  cleanup_mnt+0x3b/0x70

Alexander Boyko added a comment - 10/Dec/24 1:11 PM Similar call trace with panic but different test sanity 133f https://testing.whamcloud.com/test_sets/a0d5b200-1f08-4e54-8b9b-d0c8bffef1cf [ 9765.923354] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [umount:372128] [ 9765.937614] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon i2c_piix4 joydev pcspkr ext4 mbcache jbd2 ata_generic ata_piix libata virtio_net crc32c_intel serio_raw net_failover virtio_blk failover [last unloaded: llog_test] [ 9765.948832] CPU: 0 PID: 372128 Comm: umount Kdump: loaded Tainted: P OE -------- - - 4.18.0-553.27.1.el8_lustre.x86_64 #1 [ 9765.951355] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 9765.952521] RIP: 0010:memset_erms+0x9/0x20 [ 9765.953476] Code: 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 f9 40 88 f0 48 89 d1 aa 4c 89 c8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 [ 9765.957207] RSP: 0018:ffffb889c163bad8 EFLAGS: 00010246 ORIG_RAX: ffffffffffffff13 [ 9765.958707] RAX: ffff9d1369d9885a RBX: ffff9d1361ff4c00 RCX: 00000000000ed000 [ 9765.960137] RDX: 0000000000100000 RSI: 000000000000005a RDI: ffff9d136b513000 [ 9765.961516] RBP: ffff9d1361ff4c28 R08: 000000000000083c R09: ffff9d136b500000 [ 9765.963000] R10: ffff9d1382fb6000 R11: ffff9d1382fb583a R12: dead000000000200 [ 9765.964496] R13: dead000000000100 R14: ffff9d1361ff6c00 R15: ffff9d1361ff4c00 [ 9765.965997] FS: 00007fe3d6abb080(0000) GS:ffff9d13fcc00000(0000) knlGS:0000000000000000 [ 9765.967763] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9765.969010] CR2: 000055e5541d90ac CR3: 0000000042a6c004 CR4: 00000000001706f0 [ 9765.970408] Call Trace: [ 9765.991128] [ 9766.012282] ? watchdog_timer_fn.cold.10+0x46/0x9e [ 9766.013299] ? watchdog+0x30/0x30 [ 9766.014011] ? __hrtimer_run_queues+0x101/0x280 [ 9766.014948] ? hrtimer_interrupt+0x100/0x220 [ 9766.015812] ? smp_apic_timer_interrupt+0x6a/0x130 [ 9766.016796] ? apic_timer_interrupt+0xf/0x20 [ 9766.017669] [ 9766.018158] ? memset_erms+0x9/0x20 [ 9766.018928] ptlrpc_service_purge_all+0x460/0xa50 [ptlrpc] [ 9766.217832] ptlrpc_unregister_service+0x420/0x9e0 [ptlrpc] [ 9766.219132] mds_stop_ptlrpc_service+0x69/0x1b0 [mdt] [ 9766.350856] mds_device_fini+0x28/0xd0 [mdt] [ 9766.351783] obd_precleanup+0x1e5/0x220 [obdclass] [ 9766.467912] class_cleanup+0x31e/0x900 [obdclass] [ 9766.468932] class_process_config+0x3b8/0x21c0 [obdclass] [ 9766.470057] class_manual_cleanup+0x459/0x780 [obdclass] [ 9766.471162] server_put_super+0xbac/0x1110 [ptlrpc] [ 9766.502236] generic_shutdown_super+0x6c/0x110 [ 9766.503195] kill_anon_super+0x14/0x30 [ 9766.503968] deactivate_locked_super+0x34/0x70 [ 9766.504864] cleanup_mnt+0x3b/0x70

People

Assignee:: WC Triage

Reporter:: Maloo

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 14/Jun/24 6:18 PM

Updated:: 27/Apr/25 7:12 AM