Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7098

sanity test_17m: test failed to respond and timed out

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/1d53141c-519f-11e5-84c4-5254006e85c2.

      The sub-test test_17m failed with the following error:

      test failed to respond and timed out
      

      seen in test of sles12 client/server on master

      Info required for matching: sanity 17m

      Attachments

        Issue Links

          Activity

            [LU-7098] sanity test_17m: test failed to respond and timed out

            Landed for 2.8

            jgmitter Joseph Gmitter (Inactive) added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16804/
            Subject: LU-7098 osd-ldiskfs: don't alloc inode directly
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 323293bab2c8d65c6f1c0b3c04671ed073719cbe

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16804/ Subject: LU-7098 osd-ldiskfs: don't alloc inode directly Project: fs/lustre-release Branch: master Current Patch Set: Commit: 323293bab2c8d65c6f1c0b3c04671ed073719cbe

            Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/16804
            Subject: LU-7098 osd-ldiskfs: don't alloc inode directly
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 79a01fd4c0308fc3e242bed7e0d4efe647cd523e

            gerrit Gerrit Updater added a comment - Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/16804 Subject: LU-7098 osd-ldiskfs: don't alloc inode directly Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 79a01fd4c0308fc3e242bed7e0d4efe647cd523e

            another instance seen in sles12 client/server test on master:
            https://testing.hpdd.intel.com/test_sets/878101f8-62bd-11e5-a45a-5254006e85c2

            from console log for mds2

            03:57:07:onyx-44vm7 login: [ 4264.945248] BUG: unable to handle kernel paging request at ffffc9800335e000
            03:57:07:[ 4264.947111] IP: [<ffffffff8151853a>] _raw_spin_lock+0xa/0x30
            03:57:07:[ 4264.948315] PGD 0 
            03:57:07:[ 4264.948761] Oops: 0002 [#1] SMP 
            03:57:08:[ 4264.949076] Modules linked in: osp(OEN) mdd(OEN) lod(OEN) mdt(OEN) lfsck(OEN) mgc(OEN) osd_ldiskfs(OEN) lquota(OEN) fid(OEN) fld(OEN) ksocklnd(OEN) ptlrpc(OEN) obdclass(OEN) lnet(OEN) sha512_generic(E) crypto_null(E) libcfs(OEN) ldiskfs(OEN) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) sunrpc(E) fscache(E) iscsi_boot_sysfs(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) af_packet(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_sa(E) ib_mad(E) ib_core(E) ib_addr(E) ppdev(E) parport_pc(E) pvpanic(E) serio_raw(E) parport(E) pcspkr(E) virtio_balloon(E) 8139too(E) 8139cp(E) mii(E) button(E) processor(E) i2c_piix4(E) dm_mod(E) ext4(E) crc16(E) mbcache(E) jbd2(E) ata_generic(E) ata_piix(E) ahci(E) libahci(E) virtio_blk(E) floppy(E) uhci_hcd(E) ehci_hcd(E) cirrus(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) drm_kms_helper(E) usbcore(E) usb_common(E) ttm(E) drm(E) virtio_pci(E) virtio_ring(E) virtio(E) libata(E) sg(E) scsi_mod(E) autofs4(E)
            03:57:08:[ 4264.949076] Supported: No, Unsupported modules are loaded
            03:57:08:[ 4264.949076] CPU: 1 PID: 2567 Comm: mdt00_002 Tainted: G           OEN  3.12.44-52.10_lustre.gb2a3954-default #1
            03:57:08:[ 4264.949076] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
            03:57:08:[ 4264.949076] task: ffff88007acaa040 ti: ffff88007acb8000 task.ti: ffff88007acb8000
            03:57:09:[ 4264.949076] RIP: 0010:[<ffffffff8151853a>]  [<ffffffff8151853a>] _raw_spin_lock+0xa/0x30
            03:57:09:[ 4264.949076] RSP: 0018:ffff88007acb9980  EFLAGS: 00010246
            03:57:09:[ 4264.949076] RAX: 0000000000010000 RBX: ffff88006bc77400 RCX: 0000000000000007
            03:57:09:[ 4264.949076] RDX: ffffc9800335e000 RSI: 0000000000000000 RDI: ffffc9800335e000
            03:57:09:[ 4264.949076] RBP: ffff88007b7f3b40 R08: 00000000000000ec R09: 00000000000000ec
            03:57:09:[ 4264.949076] R10: 0000000000000025 R11: 000000000000000e R12: ffff88007b08b140
            03:57:09:[ 4264.949076] R13: 0000000000000000 R14: 000000000000000d R15: 0000000000000001
            03:57:09:[ 4264.949076] FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
            03:57:09:[ 4264.949076] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
            03:57:09:[ 4264.949076] CR2: ffffc9800335e000 CR3: 0000000077c3d000 CR4: 00000000000006e0
            03:57:09:[ 4264.949076] Stack:
            03:57:09:[ 4264.949076]  ffffffffa09e45e7 ffff880036e4d490 ffffc9800335e000 ffffffff00001886
            03:57:09:[ 4264.949076]  ffff88007b7f3b40 ffff88007acb9a48 ffff88006bc77400 0000000000000000
            03:57:10:[ 4264.949076]  000000000000000d ffff88007b7f3b40 ffffffffa09dd170 0000000000000000
            03:57:10:[ 4264.949076] Call Trace:
            03:57:10:[ 4264.949076]  [<ffffffffa09e45e7>] ldlm_resource_get+0x67/0xa30 [ptlrpc]
            03:57:10:[ 4264.949076]  [<ffffffffa09dd170>] ldlm_lock_create+0x60/0xb30 [ptlrpc]
            03:57:10:[ 4264.949076]  [<ffffffffa09f9f2e>] ldlm_cli_enqueue_local+0xce/0x950 [ptlrpc]
            03:57:10:[ 4264.949076]  [<ffffffffa0df972a>] mdt_object_local_lock+0x1ea/0xad0 [mdt]
            03:57:10:[ 4264.949076]  [<ffffffffa0dfacb1>] mdt_getattr_name_lock+0x9f1/0x18a0 [mdt]
            03:57:10:[ 4264.949076]  [<ffffffffa0dfbdef>] mdt_intent_getattr+0x28f/0x440 [mdt]
            03:57:11:[ 4264.949076]  [<ffffffffa0dfef2c>] mdt_intent_policy+0x59c/0xb50 [mdt]
            03:57:11:[ 4264.949076]  [<ffffffffa09ddf63>] ldlm_lock_enqueue+0x323/0x890 [ptlrpc]
            03:57:11:[ 4264.949076]  [<ffffffffa0a06361>] ldlm_handle_enqueue0+0x741/0x1870 [ptlrpc]
            03:57:11:[ 4264.949076]  [<ffffffffa0a897fd>] tgt_enqueue+0x5d/0x210 [ptlrpc]
            03:57:11:[ 4264.949076]  [<ffffffffa0a8dd33>] tgt_request_handle+0x7e3/0x1190 [ptlrpc]
            03:58:35:[ 4264.949076]  [<ffffffffa0a37aa9>] ptlrpc_server_handle_request+0x209/0xa70 [ptlrpc]
            03:58:36:[ 4264.949076]  [<ffffffffa0a3b1ba>] ptlrpc_main+0xb2a/0x1ea0 [ptlrpc]
            03:58:37:[ 4264.949076]  [<ffffffff810770f4>] kthread+0xb4/0xc0
            03:58:37:[ 4264.949076]  [<ffffffff81520618>] ret_from_fork+0x58/0x90
            03:58:37:[ 4264.949076] Code: fa 66 0f 1f 44 00 00 48 83 c7 04 f0 ff 0f 74 05 e8 fc 28 d9 ff 48 89 d0 c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 00 00 01 00 <f0> 0f c1 07 89 c2 c1 ea 10 66 39 c2 89 d1 75 01 c3 0f b7 07 66 
            03:58:38:[ 4264.949076] RIP  [<ffffffff8151853a>] _raw_spin_lock+0xa/0x30
            03:58:38:[ 4264.949076]  RSP <ffff88007acb9980>
            03:58:38:[ 4264.949076] CR2: ffffc9800335e000
            03:58:38:[    0.004005] Failed to access perfctr msr (MSR c1 is 0)
            03:58:38:[    1.334430] systemd[1]: /usr/lib/systemd/system-generators/kdump-device-timeout-generator exited with exit status 2.
            03:58:39:[    4.691663] irq 11: nobody cared (try booting with the "irqpoll" option)
            03:58:39:[    4.692007] handlers:
            03:58:39:[    4.692007] [<ffffffffa00caf80>] usb_hcd_irq [usbcore]
            03:58:39:[    4.692007] Disabling IRQ #11
            03:58:39:Unable to ioctl(KDSETLED) -- are you not on the console? (Inappropriate ioctl for device)
            03:58:39:Deletion of old dump only on local disk.
            03:58:39:Extracting dmesg
            03:58:40:-------------------------------------------------------------------------------
            03:58:40:
            03:58:41:The dmesg log is saved to /mnt/2015-09-23-20:57/dmesg.txt.
            03:58:42:
            03:58:42:makedumpfile Completed.
            03:58:42:-------------------------------------------------------------------------------
            03:58:42:Saving dump using makedumpfile
            03:58:42:-------------------------------------------------------------------------------
            03:58:42:
            Excluding unnecessary pages        : [  0.0 %] /
            Excluding unnecessary pages        : [100.0 %] |
            Excluding unnecessary pages        : [100.0 %] \
            Excluding unnecessary pages        : [  0.0 %] -
            Excluding unnecessary pages        : [100.0 %] /[    7.347488] Out of memory: Kill process 77 (haveged) score 34 or sacrifice child
            03:58:42:[    7.348136] Killed process 77 (haveged) total-vm:12032kB, anon-rss:3124kB, file-rss:652kB
            05:17:27:********** Timeout by autotest system **********
            
            bogl Bob Glossman (Inactive) added a comment - another instance seen in sles12 client/server test on master: https://testing.hpdd.intel.com/test_sets/878101f8-62bd-11e5-a45a-5254006e85c2 from console log for mds2 03:57:07:onyx-44vm7 login: [ 4264.945248] BUG: unable to handle kernel paging request at ffffc9800335e000 03:57:07:[ 4264.947111] IP: [<ffffffff8151853a>] _raw_spin_lock+0xa/0x30 03:57:07:[ 4264.948315] PGD 0 03:57:07:[ 4264.948761] Oops: 0002 [#1] SMP 03:57:08:[ 4264.949076] Modules linked in: osp(OEN) mdd(OEN) lod(OEN) mdt(OEN) lfsck(OEN) mgc(OEN) osd_ldiskfs(OEN) lquota(OEN) fid(OEN) fld(OEN) ksocklnd(OEN) ptlrpc(OEN) obdclass(OEN) lnet(OEN) sha512_generic(E) crypto_null(E) libcfs(OEN) ldiskfs(OEN) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) sunrpc(E) fscache(E) iscsi_boot_sysfs(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) af_packet(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_sa(E) ib_mad(E) ib_core(E) ib_addr(E) ppdev(E) parport_pc(E) pvpanic(E) serio_raw(E) parport(E) pcspkr(E) virtio_balloon(E) 8139too(E) 8139cp(E) mii(E) button(E) processor(E) i2c_piix4(E) dm_mod(E) ext4(E) crc16(E) mbcache(E) jbd2(E) ata_generic(E) ata_piix(E) ahci(E) libahci(E) virtio_blk(E) floppy(E) uhci_hcd(E) ehci_hcd(E) cirrus(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) drm_kms_helper(E) usbcore(E) usb_common(E) ttm(E) drm(E) virtio_pci(E) virtio_ring(E) virtio(E) libata(E) sg(E) scsi_mod(E) autofs4(E) 03:57:08:[ 4264.949076] Supported: No, Unsupported modules are loaded 03:57:08:[ 4264.949076] CPU: 1 PID: 2567 Comm: mdt00_002 Tainted: G OEN 3.12.44-52.10_lustre.gb2a3954-default #1 03:57:08:[ 4264.949076] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 03:57:08:[ 4264.949076] task: ffff88007acaa040 ti: ffff88007acb8000 task.ti: ffff88007acb8000 03:57:09:[ 4264.949076] RIP: 0010:[<ffffffff8151853a>] [<ffffffff8151853a>] _raw_spin_lock+0xa/0x30 03:57:09:[ 4264.949076] RSP: 0018:ffff88007acb9980 EFLAGS: 00010246 03:57:09:[ 4264.949076] RAX: 0000000000010000 RBX: ffff88006bc77400 RCX: 0000000000000007 03:57:09:[ 4264.949076] RDX: ffffc9800335e000 RSI: 0000000000000000 RDI: ffffc9800335e000 03:57:09:[ 4264.949076] RBP: ffff88007b7f3b40 R08: 00000000000000ec R09: 00000000000000ec 03:57:09:[ 4264.949076] R10: 0000000000000025 R11: 000000000000000e R12: ffff88007b08b140 03:57:09:[ 4264.949076] R13: 0000000000000000 R14: 000000000000000d R15: 0000000000000001 03:57:09:[ 4264.949076] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 03:57:09:[ 4264.949076] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 03:57:09:[ 4264.949076] CR2: ffffc9800335e000 CR3: 0000000077c3d000 CR4: 00000000000006e0 03:57:09:[ 4264.949076] Stack: 03:57:09:[ 4264.949076] ffffffffa09e45e7 ffff880036e4d490 ffffc9800335e000 ffffffff00001886 03:57:09:[ 4264.949076] ffff88007b7f3b40 ffff88007acb9a48 ffff88006bc77400 0000000000000000 03:57:10:[ 4264.949076] 000000000000000d ffff88007b7f3b40 ffffffffa09dd170 0000000000000000 03:57:10:[ 4264.949076] Call Trace: 03:57:10:[ 4264.949076] [<ffffffffa09e45e7>] ldlm_resource_get+0x67/0xa30 [ptlrpc] 03:57:10:[ 4264.949076] [<ffffffffa09dd170>] ldlm_lock_create+0x60/0xb30 [ptlrpc] 03:57:10:[ 4264.949076] [<ffffffffa09f9f2e>] ldlm_cli_enqueue_local+0xce/0x950 [ptlrpc] 03:57:10:[ 4264.949076] [<ffffffffa0df972a>] mdt_object_local_lock+0x1ea/0xad0 [mdt] 03:57:10:[ 4264.949076] [<ffffffffa0dfacb1>] mdt_getattr_name_lock+0x9f1/0x18a0 [mdt] 03:57:10:[ 4264.949076] [<ffffffffa0dfbdef>] mdt_intent_getattr+0x28f/0x440 [mdt] 03:57:11:[ 4264.949076] [<ffffffffa0dfef2c>] mdt_intent_policy+0x59c/0xb50 [mdt] 03:57:11:[ 4264.949076] [<ffffffffa09ddf63>] ldlm_lock_enqueue+0x323/0x890 [ptlrpc] 03:57:11:[ 4264.949076] [<ffffffffa0a06361>] ldlm_handle_enqueue0+0x741/0x1870 [ptlrpc] 03:57:11:[ 4264.949076] [<ffffffffa0a897fd>] tgt_enqueue+0x5d/0x210 [ptlrpc] 03:57:11:[ 4264.949076] [<ffffffffa0a8dd33>] tgt_request_handle+0x7e3/0x1190 [ptlrpc] 03:58:35:[ 4264.949076] [<ffffffffa0a37aa9>] ptlrpc_server_handle_request+0x209/0xa70 [ptlrpc] 03:58:36:[ 4264.949076] [<ffffffffa0a3b1ba>] ptlrpc_main+0xb2a/0x1ea0 [ptlrpc] 03:58:37:[ 4264.949076] [<ffffffff810770f4>] kthread+0xb4/0xc0 03:58:37:[ 4264.949076] [<ffffffff81520618>] ret_from_fork+0x58/0x90 03:58:37:[ 4264.949076] Code: fa 66 0f 1f 44 00 00 48 83 c7 04 f0 ff 0f 74 05 e8 fc 28 d9 ff 48 89 d0 c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 00 00 01 00 <f0> 0f c1 07 89 c2 c1 ea 10 66 39 c2 89 d1 75 01 c3 0f b7 07 66 03:58:38:[ 4264.949076] RIP [<ffffffff8151853a>] _raw_spin_lock+0xa/0x30 03:58:38:[ 4264.949076] RSP <ffff88007acb9980> 03:58:38:[ 4264.949076] CR2: ffffc9800335e000 03:58:38:[ 0.004005] Failed to access perfctr msr (MSR c1 is 0) 03:58:38:[ 1.334430] systemd[1]: /usr/lib/systemd/system-generators/kdump-device-timeout-generator exited with exit status 2. 03:58:39:[ 4.691663] irq 11: nobody cared (try booting with the "irqpoll" option) 03:58:39:[ 4.692007] handlers: 03:58:39:[ 4.692007] [<ffffffffa00caf80>] usb_hcd_irq [usbcore] 03:58:39:[ 4.692007] Disabling IRQ #11 03:58:39:Unable to ioctl(KDSETLED) -- are you not on the console? (Inappropriate ioctl for device) 03:58:39:Deletion of old dump only on local disk. 03:58:39:Extracting dmesg 03:58:40:------------------------------------------------------------------------------- 03:58:40: 03:58:41:The dmesg log is saved to /mnt/2015-09-23-20:57/dmesg.txt. 03:58:42: 03:58:42:makedumpfile Completed. 03:58:42:------------------------------------------------------------------------------- 03:58:42:Saving dump using makedumpfile 03:58:42:------------------------------------------------------------------------------- 03:58:42: Excluding unnecessary pages : [ 0.0 %] / Excluding unnecessary pages : [100.0 %] | Excluding unnecessary pages : [100.0 %] \ Excluding unnecessary pages : [ 0.0 %] - Excluding unnecessary pages : [100.0 %] /[ 7.347488] Out of memory: Kill process 77 (haveged) score 34 or sacrifice child 03:58:42:[ 7.348136] Killed process 77 (haveged) total-vm:12032kB, anon-rss:3124kB, file-rss:652kB 05:17:27:********** Timeout by autotest system **********

            another seen on master, also sles12 client/server:
            https://testing.hpdd.intel.com/test_sets/b3dea53e-5a5c-11e5-825b-5254006e85c2

            this may be a blocker for landing sles12 on master. whatever the problem is it seems to be master onlly, sles12 test on other branches have passed.

            bogl Bob Glossman (Inactive) added a comment - another seen on master, also sles12 client/server: https://testing.hpdd.intel.com/test_sets/b3dea53e-5a5c-11e5-825b-5254006e85c2 this may be a blocker for landing sles12 on master. whatever the problem is it seems to be master onlly, sles12 test on other branches have passed.

            People

              ys Yang Sheng
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: