Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for S Buisson <sbuisson@ddn.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/d533e8c9-2124-4c8c-93ed-0679aa9f2d88
Test session details:
clients: https://build.whamcloud.com/job/lustre-b_es-reviews/18667 - 4.18.0-477.27.1.el8_8.x86_64
servers: https://build.whamcloud.com/job/lustre-b_es-reviews/18667 - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64
[13584.283487] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [dist_txn-1:512872] [13584.288569] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc pcspkr joydev virtio_balloon i2c_piix4 ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net serio_raw net_failover virtio_blk failover [last unloaded: libcfs] [13584.298845] CPU: 1 PID: 512872 Comm: dist_txn-1 Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1 [13584.301372] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [13584.302527] RIP: 0010:native_safe_halt+0xe/0x20 [13584.303499] Code: 00 f0 80 48 02 20 48 8b 00 a8 08 75 c0 e9 79 ff ff ff 90 90 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 46 90 60 00 fb f4 <e9> dd 01 40 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 e9 07 00 00 [13584.307067] RSP: 0018:ffffbd2782f3fb98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 [13584.308571] RAX: 0000000000000003 RBX: ffff9fb1a0a8d850 RCX: 0000000000000008 [13584.309988] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff9fb1a0a8d850 [13584.311402] RBP: ffff9fb23fd33d40 R08: 0000000000000008 R09: 0000000000000024 [13584.312815] R10: 0000000000000002 R11: ffffbd2782f3fc18 R12: 0000000000000000 [13584.314220] R13: 0000000000000001 R14: 0000000000000100 R15: 0000000000080000 [13584.315628] FS: 0000000000000000(0000) GS:ffff9fb23fd00000(0000) knlGS:0000000000000000 [13584.317204] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [13584.318355] CR2: 00005573c765d4b0 CR3: 00000000a4410001 CR4: 00000000001706e0 [13584.319771] Call Trace: [13584.320342] kvm_wait+0x58/0x60 [13584.321053] __pv_queued_spin_lock_slowpath+0x268/0x2a0 [13584.322163] _raw_spin_lock+0x1e/0x30 [13584.322945] osp_check_and_set_rpc_version+0x9f/0x240 [osp] [13584.324184] ? osp_md_write+0x3a0/0x5d0 [osp] [13584.325102] ? dt_record_write+0x32/0x120 [obdclass] [13584.326469] ? llog_osd_write_rec+0x6e9/0x1a30 [obdclass] [13584.327618] ? llog_osd_regular_fid_del_name_entry+0x29c/0x5d0 [obdclass] [13584.329042] ? llog_write_rec+0x3f7/0x520 [obdclass] [13584.330098] ? llog_cancel_arr_rec+0x3b2/0xbd0 [obdclass] [13584.331259] ? llog_cat_cancel_arr_rec+0x1d5/0x430 [obdclass] [13584.332487] ? llog_cat_cancel_records+0x61/0x190 [obdclass] [13584.333766] ? distribute_txn_commit_thread+0x3cf/0xbb0 [ptlrpc] [13584.335530] ? distribute_txn_commit_batchid_update+0x860/0x860 [ptlrpc] [13584.336959] ? kthread+0x134/0x150 [13584.337702] ? set_kthread_struct+0x50/0x50 [13584.338569] ? ret_from_fork+0x35/0x40 [13584.339360] Kernel panic - not syncing: softlockup: hung tasks [13584.340553] CPU: 1 PID: 512872 Comm: dist_txn-1 Kdump: loaded Tainted: G OEL --------- - - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1 [13584.343089] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [13584.344251] Call Trace: [13584.344820] <IRQ> [13584.345284] dump_stack+0x41/0x60 [13584.346021] panic+0xe7/0x2ac [13584.346688] watchdog_timer_fn.cold.10+0x85/0x9e [13584.347674] ? watchdog+0x30/0x30 [13584.348398] __hrtimer_run_queues+0x101/0x280 [13584.349330] hrtimer_interrupt+0x100/0x220 [13584.350190] smp_apic_timer_interrupt+0x6a/0x130 [13584.351152] apic_timer_interrupt+0xf/0x20 [13584.352011] </IRQ> [13584.352491] RIP: 0010:native_safe_halt+0xe/0x20 [13584.353417] Code: 00 f0 80 48 02 20 48 8b 00 a8 08 75 c0 e9 79 ff ff ff 90 90 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 46 90 60 00 fb f4 <e9> dd 01 40 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 e9 07 00 00 [13584.357002] RSP: 0018:ffffbd2782f3fb98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 [13584.358509] RAX: 0000000000000003 RBX: ffff9fb1a0a8d850 RCX: 0000000000000008 [13584.359945] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff9fb1a0a8d850 [13584.361371] RBP: ffff9fb23fd33d40 R08: 0000000000000008 R09: 0000000000000024 [13584.362795] R10: 0000000000000002 R11: ffffbd2782f3fc18 R12: 0000000000000000 [13584.364215] R13: 0000000000000001 R14: 0000000000000100 R15: 0000000000080000 [13584.365636] kvm_wait+0x58/0x60 [13584.366316] __pv_queued_spin_lock_slowpath+0x268/0x2a0 [13584.367379] _raw_spin_lock+0x1e/0x30 [13584.368147] osp_check_and_set_rpc_version+0x9f/0x240 [osp] [13584.369299] ? osp_md_write+0x3a0/0x5d0 [osp] [13584.370225] ? dt_record_write+0x32/0x120 [obdclass] [13584.371303] ? llog_osd_write_rec+0x6e9/0x1a30 [obdclass] [13584.372449] ? llog_osd_regular_fid_del_name_entry+0x29c/0x5d0 [obdclass] [13584.373856] ? llog_write_rec+0x3f7/0x520 [obdclass] [13584.374925] ? llog_cancel_arr_rec+0x3b2/0xbd0 [obdclass] [13584.376070] ? llog_cat_cancel_arr_rec+0x1d5/0x430 [obdclass] [13584.377283] ? llog_cat_cancel_records+0x61/0x190 [obdclass] [13584.378481] ? distribute_txn_commit_thread+0x3cf/0xbb0 [ptlrpc] [13584.379779] ? distribute_txn_commit_batchid_update+0x860/0x860 [ptlrpc] [13584.381203] ? kthread+0x134/0x150 [13584.381930] ? set_kthread_struct+0x50/0x50 [13584.382806] ? ret_from_fork+0x35/0x40
Attachments
Issue Links
- is related to
-
LU-18664 distribute_txn_commit_batchid_update() is called in an atomic context
-
- Open
-