[LU-15503] Crash in qsd_upd_thread trying to print a debug message. Created: 29/Jan/22 Updated: 26/Feb/22 Resolved: 07/Feb/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.0 |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Oleg Drokin | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
maloo is hitting this crash: [125926.340052] general protection fault: 0000 [#1] SMP PTI [125926.341277] CPU: 1 PID: 825968 Comm: lquota_wb_lustr Kdump: loaded Tainted: G W OE --------- - - 4.18.0-240.22.1.el8_3.x86_64 #1 [125926.343699] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [125926.372056] RIP: 0010:string_nocheck+0x12/0x70 [125926.373076] Code: 00 00 4c 89 e2 be 20 00 00 00 48 89 ef e8 86 93 00 00 4c 01 e3 eb 81 90 49 89 f2 48 89 ce 48 89 f8 48 c1 fe 30 66 85 f6 74 4f <44> 0f b6 0a 45 84 c9 74 46 83 ee 01 41 b8 01 00 00 00 48 8d 7c 37 [125926.376578] RSP: 0018:ffffab2105dd3cb8 EFLAGS: 00010286 [125926.377625] RAX: ffff9afe29483d9f RBX: ffff9afe29484000 RCX: ffff0a00ffffff04 [125926.379021] RDX: 247c894800000028 RSI: ffffffffffffffff RDI: ffff9afe29483d9f [125926.380429] RBP: 247c894800000028 R08: 0000000000000055 R09: 0000000000000001 [125926.381821] R10: ffff9afe29484000 R11: ffff9afe29483d4f R12: ffff0a00ffffff04 [125926.383218] R13: ffffffffc159a59a R14: 0000000000000261 R15: ffffffffc159a59a [125926.384612] FS: 0000000000000000(0000) GS:ffff9afebfd00000(0000) knlGS:0000000000000000 [125926.386200] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [125926.387345] CR2: 00007f172b497000 CR3: 000000009ac0a003 CR4: 00000000001606e0 [125926.388746] Call Trace: [125926.394666] string+0x40/0x50 [125926.403630] vsnprintf+0x33c/0x520 [125926.404461] libcfs_debug_msg+0x83d/0xb00 [libcfs] [125926.412242] ? try_to_del_timer_sync+0x4d/0x80 [125926.413177] ? __next_timer_interrupt+0xf0/0xf0 [125926.414185] ? qsd_upd_thread+0x86e/0xd20 [lquota] [125926.415176] qsd_upd_thread+0x86e/0xd20 [lquota] [125926.416136] ? qsd_upd_add+0x100/0x100 [lquota] [125926.417086] kthread+0x112/0x130 [125926.417784] ? kthread_flush_work_fn+0x10/0x10 [125926.418703] ret_from_fork+0x35/0x40 [125926.419472] Modules linked in: dm_flakey nfsd nfs_acl obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul dm_mod ghash_clmulni_intel pcspkr joydev virtio_balloon i2c_piix4 ip_tables ext4 mbcache jbd2 ata_generic ata_piix libata virtio_net crc32c_intel serio_raw net_failover failover virtio_blk [last unloaded: dm_flakey] There's only a single print in that function so I can only assume list_entry returns garbage?: if (count % 7 == 0) { n = list_entry(&queue, struct qsd_upd_rec, qur_link); CWARN("%s: The reintegration thread [%d] " "blocked more than %ld seconds\n", n->qur_qqi->qqi_qsd->qsd_svname, n->qur_qqi->qqi_qtype, count * cfs_time_seconds(QSD_WB_INTERVAL) / 10); } Example reports: https://testing.whamcloud.com/test_sets/785c0e7b-cd04-422a-8bc3-9eaacc47d4b0 https://testing.whamcloud.com/test_sets/43f81877-2c6c-411a-990a-911905b85a7f https://testing.whamcloud.com/test_sets/44640986-5ef4-48cc-a468-beefa26fcd3a
So far this was only observed on rhel8 testing only |
| Comments |
| Comment by Gerrit Updater [ 29/Jan/22 ] |
|
"Yang Sheng <ys@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46380 |
| Comment by Gerrit Updater [ 07/Feb/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46380/ |
| Comment by Peter Jones [ 07/Feb/22 ] |
|
Landed for 2.15 |