Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
there's a relatively new crash being observed in maloo testing on rhel8 testing for past several days in cleanup of recovery-small in review-dne-part-5
[ 753.814277] Lustre: DEBUG MARKER: == recovery-small test complete, duration 6075 sec ======= 10:32:41 (1629887561) [ 783.518588] general protection fault: 0000 [#1] SMP PTI [ 783.519513] CPU: 0 PID: 3045 Comm: mdt_rdpg00_000 Kdump: loaded Tainted: G OE --------- - - 4.18.0-240.22.1.el8_lustre.x86_64 #1 [ 783.521414] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 783.522494] RIP: 0010:llog_exist+0xd9/0x180 [obdclass] [ 783.523265] Code: c7 05 7f 0f 0c 00 01 00 00 00 e8 a2 53 ee ff 5b c3 48 85 ff 0f 84 aa 00 00 00 48 8b 87 08 01 00 00 48 85 c0 0f 84 9a 00 00 00 <48> 8b 40 50 48 85 c0 74 53 48 89 df e8 b6 b3 77 fb f6 05 2b f6 f0 [ 783.526007] RSP: 0018:ffffae2500ea3ae8 EFLAGS: 00010206 [ 783.526776] RAX: 5a5a5a5a5a5a5a5a RBX: ffff9f95b14bd000 RCX: 0000000000000000 [ 783.527839] RDX: 0000000000000ba5 RSI: 0000000000000000 RDI: ffff9f959e040900 [ 783.528890] RBP: ffff9f959091f0d0 R08: 000000d823bc83f1 R09: 0000000000000bc0 [ 783.530782] R10: ffffae2500ea3ae8 R11: ffff9f95a7d08b6c R12: ffff9f95b04c2080 [ 783.532083] R13: ffff9f95b10d7ec0 R14: ffff9f959091f0d0 R15: ffff9f959132c000 [ 783.533175] FS: 0000000000000000(0000) GS:ffff9f95bfc00000(0000) knlGS:0000000000000000 [ 783.534423] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 783.535314] CR2: 00007fd080ad6000 CR3: 000000008960a005 CR4: 00000000003606f0 [ 783.536412] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 783.537505] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 783.538587] Call Trace: [ 783.539040] llog_cat_prep_log+0x4f/0x3c0 [obdclass] [ 783.539833] llog_cat_declare_add_rec+0x56/0x220 [obdclass] [ 783.540700] llog_declare_add+0x187/0x1d0 [obdclass] [ 783.541925] top_trans_start+0x212/0x940 [ptlrpc] [ 783.542820] mdd_attr_set+0x657/0xfe0 [mdd] [ 783.543538] ? panic_notifier+0x20/0x20 [libcfs] [ 783.544400] mdt_mfd_close+0x56c/0x8c0 [mdt] [ 783.545087] mdt_close_internal+0xc4/0x240 [mdt] [ 783.545820] mdt_close+0x47d/0x8b0 [mdt] [ 783.546470] tgt_request_handle+0xc90/0x1940 [ptlrpc] [ 783.547300] ptlrpc_server_handle_request+0x323/0xbc0 [ptlrpc] [ 783.548246] ptlrpc_main+0xba2/0x1490 [ptlrpc] [ 783.548964] ? __schedule+0x2cc/0x700 [ 783.549562] ? ptlrpc_wait_event+0x500/0x500 [ptlrpc] [ 783.550380] kthread+0x112/0x130 [ 783.550894] ? kthread_flush_work_fn+0x10/0x10 [ 783.551577] ret_from_fork+0x35/0x40 [ 783.552136] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) dm_flakey ptlrpc_gss(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm sunrpc ib_core intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul dm_mod ghash_clmulni_intel pcspkr joydev virtio_balloon i2c_piix4 ip_tables ext4 mbcache jbd2 ata_generic ata_piix 8139too libata 8139cp crc32c_intel serio_raw virtio_blk mii
The 3 observed failures are:
https://testing.whamcloud.com/test_sessions/d7bfd1a1-7ebf-40df-b80b-7e104a524f67
https://testing.whamcloud.com/test_sets/f93ef5ed-4963-4851-ac10-3e0477e9543c
and
https://testing.whamcloud.com/test_sets/1976e1d2-dc91-45b3-b29f-691f921ddeff
the RAX is suspicious value so probably accessing some free memory?