Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9047

sanity-lfsck test_31a: test failed to respond and timed out

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Blocker
    • None
    • Lustre 2.10.0
    • None
    • MDSCOUNT=4
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/819e44c6-e309-11e6-bac2-5254006e85c2.

      The sub-test test_31a failed with the following error:

      test failed to respond and timed out
      

      This looks a bit like LU-8362 (I think) but since that is marked Resolved I'm creating a new ticket. I'll let an expert decide if this is an old issue or a new one.

      Panic seen on MDS:

      08:29:09:[14599.772616] ------------[ cut here ]------------
      08:29:09:[14599.775103] WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
      08:29:09:[14599.777257] list_del corruption. next->prev should be ffff880039623660, but was           (null)
      08:29:09:
      08:29:09:[14599.779597] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_ssse3 sha512_generic crypto_null libcfs(OE) ldiskfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt nfsd ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr virtio_balloon i2c_piix4 parport_pc parport nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk crct10dif_pclmul crct10dif_common cirrus 8139too crc32c_intel drm_kms_helper serio_raw syscopyarea sysfillrect sysimgblt virtio_pci virtio_ring virtio fb_sys_fops ttm 8139cp mii drm i2c_core ata_piix libata floppy
      08:29:09:[14599.802619] CPU: 1 PID: 10612 Comm: lfsck_namespace Tainted: G           OE  ------------   3.10.0-514.2.2.el7_lustre.x86_64 #1
      08:29:09:[14599.806954] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      08:29:09:[14599.809170]  ffff88001f3ef678 000000009fda725f ffff88001f3ef630 ffffffff81686318
      08:29:09:[14599.811564]  ffff88001f3ef668 ffffffff81085940 ffff880039623660 ffff880039623660
      08:29:09:[14599.813971]  ffff88001ff2b000 00000000000fa540 0000000000000000 ffff88001f3ef6d0
      08:29:09:[14599.816367] Call Trace:
      08:29:09:[14599.818307]  [<ffffffff81686318>] dump_stack+0x19/0x1b
      08:29:09:[14599.820443]  [<ffffffff81085940>] warn_slowpath_common+0x70/0xb0
      08:29:09:[14599.822596]  [<ffffffff810859dc>] warn_slowpath_fmt+0x5c/0x80
      08:29:09:[14599.824686]  [<ffffffff813332e2>] __list_del_entry+0x82/0xd0
      08:29:09:[14599.826767]  [<ffffffff8133333d>] list_del+0xd/0x30
      08:29:09:[14599.828797]  [<ffffffffa07c66d4>] lnet_me_unlink+0x14/0xc0 [lnet]
      08:29:09:[14599.830872]  [<ffffffffa07caa48>] lnet_md_unlink+0x308/0x3f0 [lnet]
      08:29:09:[14599.832960]  [<ffffffffa07cb195>] lnet_try_match_md+0x1e5/0x330 [lnet]
      08:29:09:[14599.835047]  [<ffffffff811a2079>] ? zone_statistics+0x89/0xa0
      08:29:09:[14599.837060]  [<ffffffffa07cb99c>] lnet_mt_match_md+0x8c/0x1b0 [lnet]
      08:29:09:[14599.839109]  [<ffffffffa07cbbcd>] lnet_ptl_match_md+0x10d/0x820 [lnet]
      08:29:09:[14599.841184]  [<ffffffff8123312c>] ? __find_get_block+0xbc/0x120
      08:29:09:[14599.843210]  [<ffffffffa07d535a>] lnet_parse_local+0x51a/0xd30 [lnet]
      08:29:09:[14599.845322]  [<ffffffffa06d0c96>] ? ldiskfs_getblk+0xa6/0x200 [ldiskfs]
      08:29:09:[14599.847426]  [<ffffffffa07d61da>] lnet_parse+0x66a/0xe60 [lnet]
      08:29:09:[14599.849483]  [<ffffffff81322803>] ? number.isra.2+0x323/0x360
      08:29:09:[14599.851530]  [<ffffffffa07d76cb>] lolnd_send+0x2b/0xa0 [lnet]
      08:29:09:[14599.853564]  [<ffffffffa07cf5ef>] lnet_ni_send+0x3f/0xe0 [lnet]
      08:29:09:[14599.855572]  [<ffffffffa07d3ad8>] lnet_send+0x978/0xc90 [lnet]
      08:29:09:[14599.857544]  [<ffffffff811de456>] ? kmem_cache_alloc_trace+0x1d6/0x200
      08:29:09:[14599.859540]  [<ffffffffa07d4035>] LNetPut+0x245/0x7a0 [lnet]
      08:29:09:[14599.861456]  [<ffffffffa0a9ce33>] ptl_send_buf+0x183/0x500 [ptlrpc]
      08:29:09:[14599.863368]  [<ffffffff810ea9ba>] ? __getnstimeofday64+0x3a/0xd0
      08:29:09:[14599.865234]  [<ffffffffa0a9f531>] ptl_send_rpc+0x611/0xda0 [ptlrpc]
      08:29:09:[14599.867137]  [<ffffffffa0a942b0>] ptlrpc_send_new_req+0x460/0xa60 [ptlrpc]
      08:29:09:[14599.869094]  [<ffffffffa0a98f11>] ptlrpc_set_wait+0x3d1/0x900 [ptlrpc]
      08:29:09:[14599.871037]  [<ffffffffa0d922f0>] ? lfsck_create_lpf_local+0xca0/0xca0 [lfsck]
      08:29:09:[14599.873018]  [<ffffffffa0aa48d5>] ? lustre_msg_set_jobid+0x95/0x100 [ptlrpc]
      08:29:09:[14599.874967]  [<ffffffffa0a94911>] ? ptlrpc_set_add_req+0x61/0xc0 [ptlrpc]
      08:29:09:[14599.876868]  [<ffffffffa0d9b2bb>] ? lfsck_async_request+0x16b/0x240 [lfsck]
      08:29:09:[14599.878770]  [<ffffffffa0da1ba4>] lfsck_assistant_notify_others+0x7d4/0x12c0 [lfsck]
      08:29:09:[14599.880723]  [<ffffffffa0da6626>] lfsck_assistant_engine+0x866/0x20c0 [lfsck]
      08:29:09:[14599.882616]  [<ffffffff810ce47c>] ? dequeue_entity+0x11c/0x5d0
      08:29:09:[14599.884407]  [<ffffffff8168b3e0>] ? __schedule+0x3b0/0x990
      08:29:09:[14599.886088]  [<ffffffff810c4fe0>] ? wake_up_state+0x20/0x20
      08:29:09:[14599.887758]  [<ffffffffa0da5dc0>] ? lfsck_master_engine+0x1330/0x1330 [lfsck]
      08:29:09:[14599.889508]  [<ffffffff810b064f>] kthread+0xcf/0xe0
      08:29:09:[14599.891031]  [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140
      08:29:09:[14599.892664]  [<ffffffff81696898>] ret_from_fork+0x58/0x90
      08:29:09:[14599.894168]  [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140
      08:29:09:[14599.895732] ---[ end trace 4894ecda8f001c8c ]---
      08:29:09:[14599.898961] general protection fault: 0000 [#1] SMP 
      08:29:09:[14599.899008] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_ssse3 sha512_generic crypto_null libcfs(OE) ldiskfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt nfsd ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr virtio_balloon i2c_piix4 parport_pc parport nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk crct10dif_pclmul crct10dif_common cirrus 8139too crc32c_intel drm_kms_helper serio_raw syscopyarea sysfillrect sysimgblt virtio_pci virtio_ring virtio fb_sys_fops ttm 8139cp mii drm i2c_core ata_piix libata floppy
      08:29:09:[14599.903021] CPU: 0 PID: 7126 Comm: socknal_sd00_00 Tainted: G        W  OE  ------------   3.10.0-514.2.2.el7_lustre.x86_64 #1
      08:29:09:[14599.903021] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      08:29:09:[14599.903021] task: ffff88007751af10 ti: ffff88007a020000 task.ti: ffff88007a020000
      08:29:09:[14599.903021] RIP: 0010:[<ffffffffa07cb987>]  [<ffffffffa07cb987>] lnet_mt_match_md+0x77/0x1b0 [lnet]
      08:29:09:[14599.903021] RSP: 0018:ffff88007a023c00  EFLAGS: 00010206
      08:29:09:[14599.903021] RAX: ffff880039623f60 RBX: ffff880077f21000 RCX: 0000000000000018
      08:29:09:[14599.903021] RDX: 0000000000000000 RSI: ffff880077a84e40 RDI: 5a5a5a5a5a5a5a5a
      08:29:09:[14599.903021] RBP: ffff88007a023c38 R08: 0000000000000005 R09: 0000282416e40000
      08:29:09:[14599.903021] R10: 0001d7dbf32505b9 R11: 0000000000000000 R12: ffff88007a023cf0
      08:29:09:[14599.903021] R13: 0000000000000000 R14: ffff88003f48ae00 R15: 0000000000000008
      08:29:09:[14599.903021] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
      08:29:09:[14599.903021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      08:29:09:[14599.903021] CR2: 00007fa69ade5050 CR3: 0000000079bb2000 CR4: 00000000000406f0
      08:29:09:[14599.903021] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      08:29:09:[14599.903021] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      08:29:09:[14599.903021] Stack:
      08:29:09:[14599.903021]  0000014000000000 ffff880077a84e40 ffff88003f48ae00 ffff880077a84e40
      08:29:09:[14599.903021]  ffff88007963b5a0 ffff88007a023cf0 ffff88007963b5a0 ffff88007a023ca8
      08:29:09:[14599.903021]  ffffffffa07cbbcd 0000000000000000 0000000000000000 0000000000000000
      08:29:09:[14599.903021] Call Trace:
      08:29:09:[14599.903021]  [<ffffffffa07cbbcd>] lnet_ptl_match_md+0x10d/0x820 [lnet]
      08:29:09:[14599.903021]  [<ffffffffa07d535a>] lnet_parse_local+0x51a/0xd30 [lnet]
      08:29:09:[14599.903021]  [<ffffffffa07d61da>] lnet_parse+0x66a/0xe60 [lnet]
      08:29:09:[14599.903021]  [<ffffffffa0a0bda0>] ksocknal_process_receive+0x480/0xda0 [ksocklnd]
      08:29:09:[14599.903021]  [<ffffffffa0a0ca4e>] ksocknal_scheduler+0xee/0x670 [ksocklnd]
      08:29:09:[14599.903021]  [<ffffffff810b1720>] ? wake_up_atomic_t+0x30/0x30
      08:29:09:[14599.903021]  [<ffffffffa0a0c960>] ? ksocknal_recv+0x2a0/0x2a0 [ksocklnd]
      08:29:09:[14599.903021]  [<ffffffff810b064f>] kthread+0xcf/0xe0
      08:29:09:[14599.903021]  [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140
      08:29:09:[14599.903021]  [<ffffffff81696898>] ret_from_fork+0x58/0x90
      08:29:09:[14599.903021]  [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140
      08:29:09:[14599.903021] Code: 48 8b 14 ca 44 8b 42 08 47 8d 3c 00 41 83 e7 08 48 39 d8 75 0d eb 41 0f 1f 44 00 00 4c 89 e8 49 89 d5 48 8b 78 58 48 85 ff 74 24 <48> 3b 47 28 0f 85 fc 00 00 00 4c 89 f2 4c 89 e6 e8 14 f6 ff ff 
      08:29:09:[14599.903021] RIP  [<ffffffffa07cb987>] lnet_mt_match_md+0x77/0x1b0 [lnet]
      08:29:09:[14599.903021]  RSP <ffff88007a023c00>
      

      Info required for matching: sanity-lfsck 31a

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: