[LU-7098] sanity test_17m: test failed to respond and timed out Created: 03/Sep/15  Updated: 08/Jul/16  Resolved: 30/Nov/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Related
is related to LU-7305 lustre-initialization-1 lustre-initia... Closed
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/1d53141c-519f-11e5-84c4-5254006e85c2.

The sub-test test_17m failed with the following error:

test failed to respond and timed out

seen in test of sles12 client/server on master

Info required for matching: sanity 17m



 Comments   
Comment by Bob Glossman (Inactive) [ 14/Sep/15 ]

another seen on master, also sles12 client/server:
https://testing.hpdd.intel.com/test_sets/b3dea53e-5a5c-11e5-825b-5254006e85c2

this may be a blocker for landing sles12 on master. whatever the problem is it seems to be master onlly, sles12 test on other branches have passed.

Comment by Bob Glossman (Inactive) [ 24/Sep/15 ]

another instance seen in sles12 client/server test on master:
https://testing.hpdd.intel.com/test_sets/878101f8-62bd-11e5-a45a-5254006e85c2

from console log for mds2

03:57:07:onyx-44vm7 login: [ 4264.945248] BUG: unable to handle kernel paging request at ffffc9800335e000
03:57:07:[ 4264.947111] IP: [<ffffffff8151853a>] _raw_spin_lock+0xa/0x30
03:57:07:[ 4264.948315] PGD 0 
03:57:07:[ 4264.948761] Oops: 0002 [#1] SMP 
03:57:08:[ 4264.949076] Modules linked in: osp(OEN) mdd(OEN) lod(OEN) mdt(OEN) lfsck(OEN) mgc(OEN) osd_ldiskfs(OEN) lquota(OEN) fid(OEN) fld(OEN) ksocklnd(OEN) ptlrpc(OEN) obdclass(OEN) lnet(OEN) sha512_generic(E) crypto_null(E) libcfs(OEN) ldiskfs(OEN) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) sunrpc(E) fscache(E) iscsi_boot_sysfs(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) af_packet(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_sa(E) ib_mad(E) ib_core(E) ib_addr(E) ppdev(E) parport_pc(E) pvpanic(E) serio_raw(E) parport(E) pcspkr(E) virtio_balloon(E) 8139too(E) 8139cp(E) mii(E) button(E) processor(E) i2c_piix4(E) dm_mod(E) ext4(E) crc16(E) mbcache(E) jbd2(E) ata_generic(E) ata_piix(E) ahci(E) libahci(E) virtio_blk(E) floppy(E) uhci_hcd(E) ehci_hcd(E) cirrus(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) drm_kms_helper(E) usbcore(E) usb_common(E) ttm(E) drm(E) virtio_pci(E) virtio_ring(E) virtio(E) libata(E) sg(E) scsi_mod(E) autofs4(E)
03:57:08:[ 4264.949076] Supported: No, Unsupported modules are loaded
03:57:08:[ 4264.949076] CPU: 1 PID: 2567 Comm: mdt00_002 Tainted: G           OEN  3.12.44-52.10_lustre.gb2a3954-default #1
03:57:08:[ 4264.949076] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
03:57:08:[ 4264.949076] task: ffff88007acaa040 ti: ffff88007acb8000 task.ti: ffff88007acb8000
03:57:09:[ 4264.949076] RIP: 0010:[<ffffffff8151853a>]  [<ffffffff8151853a>] _raw_spin_lock+0xa/0x30
03:57:09:[ 4264.949076] RSP: 0018:ffff88007acb9980  EFLAGS: 00010246
03:57:09:[ 4264.949076] RAX: 0000000000010000 RBX: ffff88006bc77400 RCX: 0000000000000007
03:57:09:[ 4264.949076] RDX: ffffc9800335e000 RSI: 0000000000000000 RDI: ffffc9800335e000
03:57:09:[ 4264.949076] RBP: ffff88007b7f3b40 R08: 00000000000000ec R09: 00000000000000ec
03:57:09:[ 4264.949076] R10: 0000000000000025 R11: 000000000000000e R12: ffff88007b08b140
03:57:09:[ 4264.949076] R13: 0000000000000000 R14: 000000000000000d R15: 0000000000000001
03:57:09:[ 4264.949076] FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
03:57:09:[ 4264.949076] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
03:57:09:[ 4264.949076] CR2: ffffc9800335e000 CR3: 0000000077c3d000 CR4: 00000000000006e0
03:57:09:[ 4264.949076] Stack:
03:57:09:[ 4264.949076]  ffffffffa09e45e7 ffff880036e4d490 ffffc9800335e000 ffffffff00001886
03:57:09:[ 4264.949076]  ffff88007b7f3b40 ffff88007acb9a48 ffff88006bc77400 0000000000000000
03:57:10:[ 4264.949076]  000000000000000d ffff88007b7f3b40 ffffffffa09dd170 0000000000000000
03:57:10:[ 4264.949076] Call Trace:
03:57:10:[ 4264.949076]  [<ffffffffa09e45e7>] ldlm_resource_get+0x67/0xa30 [ptlrpc]
03:57:10:[ 4264.949076]  [<ffffffffa09dd170>] ldlm_lock_create+0x60/0xb30 [ptlrpc]
03:57:10:[ 4264.949076]  [<ffffffffa09f9f2e>] ldlm_cli_enqueue_local+0xce/0x950 [ptlrpc]
03:57:10:[ 4264.949076]  [<ffffffffa0df972a>] mdt_object_local_lock+0x1ea/0xad0 [mdt]
03:57:10:[ 4264.949076]  [<ffffffffa0dfacb1>] mdt_getattr_name_lock+0x9f1/0x18a0 [mdt]
03:57:10:[ 4264.949076]  [<ffffffffa0dfbdef>] mdt_intent_getattr+0x28f/0x440 [mdt]
03:57:11:[ 4264.949076]  [<ffffffffa0dfef2c>] mdt_intent_policy+0x59c/0xb50 [mdt]
03:57:11:[ 4264.949076]  [<ffffffffa09ddf63>] ldlm_lock_enqueue+0x323/0x890 [ptlrpc]
03:57:11:[ 4264.949076]  [<ffffffffa0a06361>] ldlm_handle_enqueue0+0x741/0x1870 [ptlrpc]
03:57:11:[ 4264.949076]  [<ffffffffa0a897fd>] tgt_enqueue+0x5d/0x210 [ptlrpc]
03:57:11:[ 4264.949076]  [<ffffffffa0a8dd33>] tgt_request_handle+0x7e3/0x1190 [ptlrpc]
03:58:35:[ 4264.949076]  [<ffffffffa0a37aa9>] ptlrpc_server_handle_request+0x209/0xa70 [ptlrpc]
03:58:36:[ 4264.949076]  [<ffffffffa0a3b1ba>] ptlrpc_main+0xb2a/0x1ea0 [ptlrpc]
03:58:37:[ 4264.949076]  [<ffffffff810770f4>] kthread+0xb4/0xc0
03:58:37:[ 4264.949076]  [<ffffffff81520618>] ret_from_fork+0x58/0x90
03:58:37:[ 4264.949076] Code: fa 66 0f 1f 44 00 00 48 83 c7 04 f0 ff 0f 74 05 e8 fc 28 d9 ff 48 89 d0 c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 00 00 01 00 <f0> 0f c1 07 89 c2 c1 ea 10 66 39 c2 89 d1 75 01 c3 0f b7 07 66 
03:58:38:[ 4264.949076] RIP  [<ffffffff8151853a>] _raw_spin_lock+0xa/0x30
03:58:38:[ 4264.949076]  RSP <ffff88007acb9980>
03:58:38:[ 4264.949076] CR2: ffffc9800335e000
03:58:38:[    0.004005] Failed to access perfctr msr (MSR c1 is 0)
03:58:38:[    1.334430] systemd[1]: /usr/lib/systemd/system-generators/kdump-device-timeout-generator exited with exit status 2.
03:58:39:[    4.691663] irq 11: nobody cared (try booting with the "irqpoll" option)
03:58:39:[    4.692007] handlers:
03:58:39:[    4.692007] [<ffffffffa00caf80>] usb_hcd_irq [usbcore]
03:58:39:[    4.692007] Disabling IRQ #11
03:58:39:Unable to ioctl(KDSETLED) -- are you not on the console? (Inappropriate ioctl for device)
03:58:39:Deletion of old dump only on local disk.
03:58:39:Extracting dmesg
03:58:40:-------------------------------------------------------------------------------
03:58:40:
03:58:41:The dmesg log is saved to /mnt/2015-09-23-20:57/dmesg.txt.
03:58:42:
03:58:42:makedumpfile Completed.
03:58:42:-------------------------------------------------------------------------------
03:58:42:Saving dump using makedumpfile
03:58:42:-------------------------------------------------------------------------------
03:58:42:
Excluding unnecessary pages        : [  0.0 %] /
Excluding unnecessary pages        : [100.0 %] |
Excluding unnecessary pages        : [100.0 %] \
Excluding unnecessary pages        : [  0.0 %] -
Excluding unnecessary pages        : [100.0 %] /[    7.347488] Out of memory: Kill process 77 (haveged) score 34 or sacrifice child
03:58:42:[    7.348136] Killed process 77 (haveged) total-vm:12032kB, anon-rss:3124kB, file-rss:652kB
05:17:27:********** Timeout by autotest system **********
Comment by Gerrit Updater [ 13/Oct/15 ]

Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/16804
Subject: LU-7098 osd-ldiskfs: don't alloc inode directly
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 79a01fd4c0308fc3e242bed7e0d4efe647cd523e

Comment by Gerrit Updater [ 30/Nov/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16804/
Subject: LU-7098 osd-ldiskfs: don't alloc inode directly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 323293bab2c8d65c6f1c0b3c04671ed073719cbe

Comment by Joseph Gmitter (Inactive) [ 30/Nov/15 ]

Landed for 2.8

Generated at Sat Feb 10 02:05:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.