[LU-17178] lustre crashed with conf-sanity 135 test Created: 10/Oct/23  Updated: 10/Oct/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Alexey Lyashkov Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

single node run.

[11169.977987] Lustre: DEBUG MARKER: == conf-sanity test 133: stripe QOS: free space balance in a pool ========================================================== 18:42:04 (1696866124)
[11170.103821] Lustre: DEBUG MARKER: SKIP: conf-sanity test_133 needs >= 4 OSTs
[11170.487800] Lustre: DEBUG MARKER: == conf-sanity test 134: check_iam works without faults == 18:42:05 (1696866125)
[11176.168649] Lustre: DEBUG MARKER: == conf-sanity test 135: check the behavior when changelog is wrapped around ========================================================== 18:42:10 (1696866130)
[11177.788724] Lustre: DEBUG MARKER: devel5: executing set_hostid
[11181.755031] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[11183.383655] LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[11185.328129] LDISKFS-fs (loop3): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[11186.223204] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[11186.418034] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[11186.766415] Lustre: Setting parameter lustre-MDT0000.mdt.identity_upcall in log lustre-MDT0000
[11186.767467] Lustre: Skipped 1 previous similar message
[11186.878494] Lustre: ctl-lustre-MDT0000: No data found on store. Initialize space: rc = -61
[11186.879509] Lustre: Skipped 1 previous similar message
[11186.929896] Lustre: lustre-MDT0000: new disk, initializing
[11187.299548] Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400]:0:mdt
[11188.127137] Lustre: 393831:0:(mdt_coordinator.c:571:cdt_start_pending_restore()) (efault): trying to init HSM before MDD
[11188.128484] kasan: CONFIG_KASAN_INLINE enabled
[11188.129037] kasan: GPF could be caused by NULL-ptr deref or user memory access
[11188.129970] general protection fault: 0000 [#1] SMP KASAN PTI
[11188.130627] CPU: 5 PID: 393831 Comm: hsm_cdtr Tainted: G    B   W  OE    ---------r-  - 4.18.0-305.25.1.el8_4.x86_64+debug #1
[11188.131924] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module_el8.7.0+3346+68867adb 04/01/2014
[11188.133163] RIP: 0010:lu_context_key_get+0xc1/0x140 [obdclass]
[11188.133862] Code: 00 fc ff df 49 8d 7d 10 48 89 fa 48 c1 ea 03 80 3c 02 00 75 7b 48 b8 00 00 00 00 00 fc ff df 49 03 5d 10 48 89 da 48 c1 ea 03 <80> 3c 02 00 75 56 48 8b 03 5b 5d 41 5c 41 5d c3 e8 0a 52 a3 d5 e8
[11188.136053] RSP: 0018:ffff88800df8fc90 EFLAGS: 00010206
[11188.136668] RAX: dffffc0000000000 RBX: 00000000000000b8 RCX: 0000000000000000
[11188.137515] RDX: 0000000000000017 RSI: ffffffffc1bd0cb8 RDI: ffff888055f9cd48
[11188.138345] RBP: 0000000000000017 R08: ffffed1020116c79 R09: ffffed1020116c79
[11188.139233] R10: ffff88800df8fcb8 R11: ffffed1020116c78 R12: ffffffffc32a1460
[11188.140065] R13: ffff888055f9cd38 R14: 0000000000000014 R15: ffff888055f9c000
[11188.140885] FS:  0000000000000000(0000) GS:ffff888110000000(0000) knlGS:0000000000000000
[11188.141839] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11188.142498] CR2: 000055bfdd4e5f44 CR3: 0000000114a2c004 CR4: 0000000000020ee0
[11188.143359] Call Trace:
[11188.143704]  mdt_coordinator+0x34a/0x4250 [mdt]
[11188.144316]  ? __switch_to_asm+0x41/0x70
[11188.144784]  ? __switch_to_asm+0x41/0x70
[11188.145268]  ? __switch_to_asm+0x35/0x70
[11188.145724]  ? rcu_read_unlock+0x50/0x50
[11188.146190]  ? __switch_to_asm+0x35/0x70
[11188.146652]  ? __switch_to_asm+0x41/0x70
[11188.147145]  ? __switch_to_asm+0x35/0x70
[11188.147609]  ? __switch_to_asm+0x41/0x70
[11188.148090]  ? __switch_to_asm+0x35/0x70
[11188.148572]  ? __switch_to_asm+0x41/0x70
[11188.149145]  ? _raw_spin_unlock_irq+0x24/0x40
[11188.149771]  ? mdt_hsm_policy_seq_write+0x1210/0x1210 [mdt]
[11188.150427]  ? _raw_spin_unlock_irq+0x24/0x40
[11188.150948]  ? lock_release+0x591/0xd70
[11188.151426]  ? finish_task_switch+0x1d1/0x7f0
[11188.151989]  ? lock_acquire+0x34d/0x8a0
[11188.152445]  ? lock_downgrade+0x710/0x710
[11188.152925]  ? rcu_read_unlock+0x50/0x50
[11188.153389]  ? __schedule+0x925/0x1a40
[11188.153841]  ? lock_contended+0xd40/0xd40
[11188.154381]  ? __kthread_parkme+0x52/0x190
[11188.154868]  ? __kthread_parkme+0xc4/0x190
[11188.155549]  ? mdt_hsm_policy_seq_write+0x1210/0x1210 [mdt]
[11188.156216]  kthread+0x344/0x410
[11188.156600]  ? kthread_insert_work_sanity_check+0xd0/0xd0
[11188.157219]  ret_from_fork+0x3a/0x50
[11188.157635] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) loop mbcache jbd2 dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache iTCO_wdt crct10dif_pclmul iTCO_vendor_support crc32_pclmul ghash_clmulni_intel qxl drm_ttm_helper joydev ttm pcspkr drm_kms_helper syscopyarea i2c_i801 sysfillrect lpc_ich virtio_balloon sysimgblt fb_sys_fops drm i6300esb sunrpc vfat fat ip_tables xfs libcrc32c ahci libahci libata crc32c_intel serio_raw e1000 virtio_console virtio_blk virtio_scsi [last unloaded: libcfs]
[11188.165201] ---[ end trace b78137f9856f832e ]---
[11188.165793] RIP: 0010:lu_context_key_get+0xc1/0x140 [obdclass]

Generated at Sat Feb 10 03:33:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.