[LU-7098] sanity test_17m: test failed to respond and timed out Created: 03/Sep/15 Updated: 08/Jul/16 Resolved: 30/Nov/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
This issue was created by maloo for Bob Glossman <bob.glossman@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/1d53141c-519f-11e5-84c4-5254006e85c2. The sub-test test_17m failed with the following error: test failed to respond and timed out seen in test of sles12 client/server on master Info required for matching: sanity 17m |
| Comments |
| Comment by Bob Glossman (Inactive) [ 14/Sep/15 ] |
|
another seen on master, also sles12 client/server: this may be a blocker for landing sles12 on master. whatever the problem is it seems to be master onlly, sles12 test on other branches have passed. |
| Comment by Bob Glossman (Inactive) [ 24/Sep/15 ] |
|
another instance seen in sles12 client/server test on master: from console log for mds2 03:57:07:onyx-44vm7 login: [ 4264.945248] BUG: unable to handle kernel paging request at ffffc9800335e000 03:57:07:[ 4264.947111] IP: [<ffffffff8151853a>] _raw_spin_lock+0xa/0x30 03:57:07:[ 4264.948315] PGD 0 03:57:07:[ 4264.948761] Oops: 0002 [#1] SMP 03:57:08:[ 4264.949076] Modules linked in: osp(OEN) mdd(OEN) lod(OEN) mdt(OEN) lfsck(OEN) mgc(OEN) osd_ldiskfs(OEN) lquota(OEN) fid(OEN) fld(OEN) ksocklnd(OEN) ptlrpc(OEN) obdclass(OEN) lnet(OEN) sha512_generic(E) crypto_null(E) libcfs(OEN) ldiskfs(OEN) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) sunrpc(E) fscache(E) iscsi_boot_sysfs(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) af_packet(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_sa(E) ib_mad(E) ib_core(E) ib_addr(E) ppdev(E) parport_pc(E) pvpanic(E) serio_raw(E) parport(E) pcspkr(E) virtio_balloon(E) 8139too(E) 8139cp(E) mii(E) button(E) processor(E) i2c_piix4(E) dm_mod(E) ext4(E) crc16(E) mbcache(E) jbd2(E) ata_generic(E) ata_piix(E) ahci(E) libahci(E) virtio_blk(E) floppy(E) uhci_hcd(E) ehci_hcd(E) cirrus(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) drm_kms_helper(E) usbcore(E) usb_common(E) ttm(E) drm(E) virtio_pci(E) virtio_ring(E) virtio(E) libata(E) sg(E) scsi_mod(E) autofs4(E) 03:57:08:[ 4264.949076] Supported: No, Unsupported modules are loaded 03:57:08:[ 4264.949076] CPU: 1 PID: 2567 Comm: mdt00_002 Tainted: G OEN 3.12.44-52.10_lustre.gb2a3954-default #1 03:57:08:[ 4264.949076] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 03:57:08:[ 4264.949076] task: ffff88007acaa040 ti: ffff88007acb8000 task.ti: ffff88007acb8000 03:57:09:[ 4264.949076] RIP: 0010:[<ffffffff8151853a>] [<ffffffff8151853a>] _raw_spin_lock+0xa/0x30 03:57:09:[ 4264.949076] RSP: 0018:ffff88007acb9980 EFLAGS: 00010246 03:57:09:[ 4264.949076] RAX: 0000000000010000 RBX: ffff88006bc77400 RCX: 0000000000000007 03:57:09:[ 4264.949076] RDX: ffffc9800335e000 RSI: 0000000000000000 RDI: ffffc9800335e000 03:57:09:[ 4264.949076] RBP: ffff88007b7f3b40 R08: 00000000000000ec R09: 00000000000000ec 03:57:09:[ 4264.949076] R10: 0000000000000025 R11: 000000000000000e R12: ffff88007b08b140 03:57:09:[ 4264.949076] R13: 0000000000000000 R14: 000000000000000d R15: 0000000000000001 03:57:09:[ 4264.949076] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 03:57:09:[ 4264.949076] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 03:57:09:[ 4264.949076] CR2: ffffc9800335e000 CR3: 0000000077c3d000 CR4: 00000000000006e0 03:57:09:[ 4264.949076] Stack: 03:57:09:[ 4264.949076] ffffffffa09e45e7 ffff880036e4d490 ffffc9800335e000 ffffffff00001886 03:57:09:[ 4264.949076] ffff88007b7f3b40 ffff88007acb9a48 ffff88006bc77400 0000000000000000 03:57:10:[ 4264.949076] 000000000000000d ffff88007b7f3b40 ffffffffa09dd170 0000000000000000 03:57:10:[ 4264.949076] Call Trace: 03:57:10:[ 4264.949076] [<ffffffffa09e45e7>] ldlm_resource_get+0x67/0xa30 [ptlrpc] 03:57:10:[ 4264.949076] [<ffffffffa09dd170>] ldlm_lock_create+0x60/0xb30 [ptlrpc] 03:57:10:[ 4264.949076] [<ffffffffa09f9f2e>] ldlm_cli_enqueue_local+0xce/0x950 [ptlrpc] 03:57:10:[ 4264.949076] [<ffffffffa0df972a>] mdt_object_local_lock+0x1ea/0xad0 [mdt] 03:57:10:[ 4264.949076] [<ffffffffa0dfacb1>] mdt_getattr_name_lock+0x9f1/0x18a0 [mdt] 03:57:10:[ 4264.949076] [<ffffffffa0dfbdef>] mdt_intent_getattr+0x28f/0x440 [mdt] 03:57:11:[ 4264.949076] [<ffffffffa0dfef2c>] mdt_intent_policy+0x59c/0xb50 [mdt] 03:57:11:[ 4264.949076] [<ffffffffa09ddf63>] ldlm_lock_enqueue+0x323/0x890 [ptlrpc] 03:57:11:[ 4264.949076] [<ffffffffa0a06361>] ldlm_handle_enqueue0+0x741/0x1870 [ptlrpc] 03:57:11:[ 4264.949076] [<ffffffffa0a897fd>] tgt_enqueue+0x5d/0x210 [ptlrpc] 03:57:11:[ 4264.949076] [<ffffffffa0a8dd33>] tgt_request_handle+0x7e3/0x1190 [ptlrpc] 03:58:35:[ 4264.949076] [<ffffffffa0a37aa9>] ptlrpc_server_handle_request+0x209/0xa70 [ptlrpc] 03:58:36:[ 4264.949076] [<ffffffffa0a3b1ba>] ptlrpc_main+0xb2a/0x1ea0 [ptlrpc] 03:58:37:[ 4264.949076] [<ffffffff810770f4>] kthread+0xb4/0xc0 03:58:37:[ 4264.949076] [<ffffffff81520618>] ret_from_fork+0x58/0x90 03:58:37:[ 4264.949076] Code: fa 66 0f 1f 44 00 00 48 83 c7 04 f0 ff 0f 74 05 e8 fc 28 d9 ff 48 89 d0 c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 00 00 01 00 <f0> 0f c1 07 89 c2 c1 ea 10 66 39 c2 89 d1 75 01 c3 0f b7 07 66 03:58:38:[ 4264.949076] RIP [<ffffffff8151853a>] _raw_spin_lock+0xa/0x30 03:58:38:[ 4264.949076] RSP <ffff88007acb9980> 03:58:38:[ 4264.949076] CR2: ffffc9800335e000 03:58:38:[ 0.004005] Failed to access perfctr msr (MSR c1 is 0) 03:58:38:[ 1.334430] systemd[1]: /usr/lib/systemd/system-generators/kdump-device-timeout-generator exited with exit status 2. 03:58:39:[ 4.691663] irq 11: nobody cared (try booting with the "irqpoll" option) 03:58:39:[ 4.692007] handlers: 03:58:39:[ 4.692007] [<ffffffffa00caf80>] usb_hcd_irq [usbcore] 03:58:39:[ 4.692007] Disabling IRQ #11 03:58:39:Unable to ioctl(KDSETLED) -- are you not on the console? (Inappropriate ioctl for device) 03:58:39:Deletion of old dump only on local disk. 03:58:39:Extracting dmesg 03:58:40:------------------------------------------------------------------------------- 03:58:40: 03:58:41:The dmesg log is saved to /mnt/2015-09-23-20:57/dmesg.txt. 03:58:42: 03:58:42:makedumpfile Completed. 03:58:42:------------------------------------------------------------------------------- 03:58:42:Saving dump using makedumpfile 03:58:42:------------------------------------------------------------------------------- 03:58:42: Excluding unnecessary pages : [ 0.0 %] / Excluding unnecessary pages : [100.0 %] | Excluding unnecessary pages : [100.0 %] \ Excluding unnecessary pages : [ 0.0 %] - Excluding unnecessary pages : [100.0 %] /[ 7.347488] Out of memory: Kill process 77 (haveged) score 34 or sacrifice child 03:58:42:[ 7.348136] Killed process 77 (haveged) total-vm:12032kB, anon-rss:3124kB, file-rss:652kB 05:17:27:********** Timeout by autotest system ********** |
| Comment by Gerrit Updater [ 13/Oct/15 ] |
|
Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/16804 |
| Comment by Gerrit Updater [ 30/Nov/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16804/ |
| Comment by Joseph Gmitter (Inactive) [ 30/Nov/15 ] |
|
Landed for 2.8 |