Details
-
Bug
-
Resolution: Cannot Reproduce
-
Blocker
-
None
-
Lustre 2.10.0
-
None
-
MDSCOUNT=4
-
3
-
9223372036854775807
Description
This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/819e44c6-e309-11e6-bac2-5254006e85c2.
The sub-test test_31a failed with the following error:
test failed to respond and timed out
This looks a bit like LU-8362 (I think) but since that is marked Resolved I'm creating a new ticket. I'll let an expert decide if this is an old issue or a new one.
Panic seen on MDS:
08:29:09:[14599.772616] ------------[ cut here ]------------ 08:29:09:[14599.775103] WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0() 08:29:09:[14599.777257] list_del corruption. next->prev should be ffff880039623660, but was (null) 08:29:09: 08:29:09:[14599.779597] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_ssse3 sha512_generic crypto_null libcfs(OE) ldiskfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt nfsd ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr virtio_balloon i2c_piix4 parport_pc parport nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk crct10dif_pclmul crct10dif_common cirrus 8139too crc32c_intel drm_kms_helper serio_raw syscopyarea sysfillrect sysimgblt virtio_pci virtio_ring virtio fb_sys_fops ttm 8139cp mii drm i2c_core ata_piix libata floppy 08:29:09:[14599.802619] CPU: 1 PID: 10612 Comm: lfsck_namespace Tainted: G OE ------------ 3.10.0-514.2.2.el7_lustre.x86_64 #1 08:29:09:[14599.806954] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 08:29:09:[14599.809170] ffff88001f3ef678 000000009fda725f ffff88001f3ef630 ffffffff81686318 08:29:09:[14599.811564] ffff88001f3ef668 ffffffff81085940 ffff880039623660 ffff880039623660 08:29:09:[14599.813971] ffff88001ff2b000 00000000000fa540 0000000000000000 ffff88001f3ef6d0 08:29:09:[14599.816367] Call Trace: 08:29:09:[14599.818307] [<ffffffff81686318>] dump_stack+0x19/0x1b 08:29:09:[14599.820443] [<ffffffff81085940>] warn_slowpath_common+0x70/0xb0 08:29:09:[14599.822596] [<ffffffff810859dc>] warn_slowpath_fmt+0x5c/0x80 08:29:09:[14599.824686] [<ffffffff813332e2>] __list_del_entry+0x82/0xd0 08:29:09:[14599.826767] [<ffffffff8133333d>] list_del+0xd/0x30 08:29:09:[14599.828797] [<ffffffffa07c66d4>] lnet_me_unlink+0x14/0xc0 [lnet] 08:29:09:[14599.830872] [<ffffffffa07caa48>] lnet_md_unlink+0x308/0x3f0 [lnet] 08:29:09:[14599.832960] [<ffffffffa07cb195>] lnet_try_match_md+0x1e5/0x330 [lnet] 08:29:09:[14599.835047] [<ffffffff811a2079>] ? zone_statistics+0x89/0xa0 08:29:09:[14599.837060] [<ffffffffa07cb99c>] lnet_mt_match_md+0x8c/0x1b0 [lnet] 08:29:09:[14599.839109] [<ffffffffa07cbbcd>] lnet_ptl_match_md+0x10d/0x820 [lnet] 08:29:09:[14599.841184] [<ffffffff8123312c>] ? __find_get_block+0xbc/0x120 08:29:09:[14599.843210] [<ffffffffa07d535a>] lnet_parse_local+0x51a/0xd30 [lnet] 08:29:09:[14599.845322] [<ffffffffa06d0c96>] ? ldiskfs_getblk+0xa6/0x200 [ldiskfs] 08:29:09:[14599.847426] [<ffffffffa07d61da>] lnet_parse+0x66a/0xe60 [lnet] 08:29:09:[14599.849483] [<ffffffff81322803>] ? number.isra.2+0x323/0x360 08:29:09:[14599.851530] [<ffffffffa07d76cb>] lolnd_send+0x2b/0xa0 [lnet] 08:29:09:[14599.853564] [<ffffffffa07cf5ef>] lnet_ni_send+0x3f/0xe0 [lnet] 08:29:09:[14599.855572] [<ffffffffa07d3ad8>] lnet_send+0x978/0xc90 [lnet] 08:29:09:[14599.857544] [<ffffffff811de456>] ? kmem_cache_alloc_trace+0x1d6/0x200 08:29:09:[14599.859540] [<ffffffffa07d4035>] LNetPut+0x245/0x7a0 [lnet] 08:29:09:[14599.861456] [<ffffffffa0a9ce33>] ptl_send_buf+0x183/0x500 [ptlrpc] 08:29:09:[14599.863368] [<ffffffff810ea9ba>] ? __getnstimeofday64+0x3a/0xd0 08:29:09:[14599.865234] [<ffffffffa0a9f531>] ptl_send_rpc+0x611/0xda0 [ptlrpc] 08:29:09:[14599.867137] [<ffffffffa0a942b0>] ptlrpc_send_new_req+0x460/0xa60 [ptlrpc] 08:29:09:[14599.869094] [<ffffffffa0a98f11>] ptlrpc_set_wait+0x3d1/0x900 [ptlrpc] 08:29:09:[14599.871037] [<ffffffffa0d922f0>] ? lfsck_create_lpf_local+0xca0/0xca0 [lfsck] 08:29:09:[14599.873018] [<ffffffffa0aa48d5>] ? lustre_msg_set_jobid+0x95/0x100 [ptlrpc] 08:29:09:[14599.874967] [<ffffffffa0a94911>] ? ptlrpc_set_add_req+0x61/0xc0 [ptlrpc] 08:29:09:[14599.876868] [<ffffffffa0d9b2bb>] ? lfsck_async_request+0x16b/0x240 [lfsck] 08:29:09:[14599.878770] [<ffffffffa0da1ba4>] lfsck_assistant_notify_others+0x7d4/0x12c0 [lfsck] 08:29:09:[14599.880723] [<ffffffffa0da6626>] lfsck_assistant_engine+0x866/0x20c0 [lfsck] 08:29:09:[14599.882616] [<ffffffff810ce47c>] ? dequeue_entity+0x11c/0x5d0 08:29:09:[14599.884407] [<ffffffff8168b3e0>] ? __schedule+0x3b0/0x990 08:29:09:[14599.886088] [<ffffffff810c4fe0>] ? wake_up_state+0x20/0x20 08:29:09:[14599.887758] [<ffffffffa0da5dc0>] ? lfsck_master_engine+0x1330/0x1330 [lfsck] 08:29:09:[14599.889508] [<ffffffff810b064f>] kthread+0xcf/0xe0 08:29:09:[14599.891031] [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140 08:29:09:[14599.892664] [<ffffffff81696898>] ret_from_fork+0x58/0x90 08:29:09:[14599.894168] [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140 08:29:09:[14599.895732] ---[ end trace 4894ecda8f001c8c ]--- 08:29:09:[14599.898961] general protection fault: 0000 [#1] SMP 08:29:09:[14599.899008] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_ssse3 sha512_generic crypto_null libcfs(OE) ldiskfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt nfsd ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr virtio_balloon i2c_piix4 parport_pc parport nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk crct10dif_pclmul crct10dif_common cirrus 8139too crc32c_intel drm_kms_helper serio_raw syscopyarea sysfillrect sysimgblt virtio_pci virtio_ring virtio fb_sys_fops ttm 8139cp mii drm i2c_core ata_piix libata floppy 08:29:09:[14599.903021] CPU: 0 PID: 7126 Comm: socknal_sd00_00 Tainted: G W OE ------------ 3.10.0-514.2.2.el7_lustre.x86_64 #1 08:29:09:[14599.903021] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 08:29:09:[14599.903021] task: ffff88007751af10 ti: ffff88007a020000 task.ti: ffff88007a020000 08:29:09:[14599.903021] RIP: 0010:[<ffffffffa07cb987>] [<ffffffffa07cb987>] lnet_mt_match_md+0x77/0x1b0 [lnet] 08:29:09:[14599.903021] RSP: 0018:ffff88007a023c00 EFLAGS: 00010206 08:29:09:[14599.903021] RAX: ffff880039623f60 RBX: ffff880077f21000 RCX: 0000000000000018 08:29:09:[14599.903021] RDX: 0000000000000000 RSI: ffff880077a84e40 RDI: 5a5a5a5a5a5a5a5a 08:29:09:[14599.903021] RBP: ffff88007a023c38 R08: 0000000000000005 R09: 0000282416e40000 08:29:09:[14599.903021] R10: 0001d7dbf32505b9 R11: 0000000000000000 R12: ffff88007a023cf0 08:29:09:[14599.903021] R13: 0000000000000000 R14: ffff88003f48ae00 R15: 0000000000000008 08:29:09:[14599.903021] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 08:29:09:[14599.903021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 08:29:09:[14599.903021] CR2: 00007fa69ade5050 CR3: 0000000079bb2000 CR4: 00000000000406f0 08:29:09:[14599.903021] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 08:29:09:[14599.903021] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 08:29:09:[14599.903021] Stack: 08:29:09:[14599.903021] 0000014000000000 ffff880077a84e40 ffff88003f48ae00 ffff880077a84e40 08:29:09:[14599.903021] ffff88007963b5a0 ffff88007a023cf0 ffff88007963b5a0 ffff88007a023ca8 08:29:09:[14599.903021] ffffffffa07cbbcd 0000000000000000 0000000000000000 0000000000000000 08:29:09:[14599.903021] Call Trace: 08:29:09:[14599.903021] [<ffffffffa07cbbcd>] lnet_ptl_match_md+0x10d/0x820 [lnet] 08:29:09:[14599.903021] [<ffffffffa07d535a>] lnet_parse_local+0x51a/0xd30 [lnet] 08:29:09:[14599.903021] [<ffffffffa07d61da>] lnet_parse+0x66a/0xe60 [lnet] 08:29:09:[14599.903021] [<ffffffffa0a0bda0>] ksocknal_process_receive+0x480/0xda0 [ksocklnd] 08:29:09:[14599.903021] [<ffffffffa0a0ca4e>] ksocknal_scheduler+0xee/0x670 [ksocklnd] 08:29:09:[14599.903021] [<ffffffff810b1720>] ? wake_up_atomic_t+0x30/0x30 08:29:09:[14599.903021] [<ffffffffa0a0c960>] ? ksocknal_recv+0x2a0/0x2a0 [ksocklnd] 08:29:09:[14599.903021] [<ffffffff810b064f>] kthread+0xcf/0xe0 08:29:09:[14599.903021] [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140 08:29:09:[14599.903021] [<ffffffff81696898>] ret_from_fork+0x58/0x90 08:29:09:[14599.903021] [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140 08:29:09:[14599.903021] Code: 48 8b 14 ca 44 8b 42 08 47 8d 3c 00 41 83 e7 08 48 39 d8 75 0d eb 41 0f 1f 44 00 00 4c 89 e8 49 89 d5 48 8b 78 58 48 85 ff 74 24 <48> 3b 47 28 0f 85 fc 00 00 00 4c 89 f2 4c 89 e6 e8 14 f6 ff ff 08:29:09:[14599.903021] RIP [<ffffffffa07cb987>] lnet_mt_match_md+0x77/0x1b0 [lnet] 08:29:09:[14599.903021] RSP <ffff88007a023c00>
Info required for matching: sanity-lfsck 31a
Attachments
Issue Links
- mentioned in
-
Page Loading...