[LU-6400] conf-sanity test_56: test failed to respond and timed out Created: 25/Mar/15 Updated: 31/Aug/15 Resolved: 27/Aug/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
This issue was created by maloo for Bob Glossman <bob.glossman@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/12b10500-d319-11e4-94cf-5254006e85c2. while maloo reports 0% fail history lookup shows a few of these in Feb. and March. The sub-test test_56 failed with the following error: test failed to respond and timed out Please provide additional information about the failure here. Info required for matching: conf-sanity 56 |
| Comments |
| Comment by Bob Glossman (Inactive) [ 25/Mar/15 ] |
|
this may be specific to sles11sp3. haven't checked all instances, but several of them all are on sles11sp3 client/server. |
| Comment by Bob Glossman (Inactive) [ 25/Mar/15 ] |
|
I see that conf-sanity test_56 was once disabled in TEI. restored in TEI-2738. maybe this is related. |
| Comment by Bob Glossman (Inactive) [ 26/Mar/15 ] |
|
another seen: |
| Comment by Oleg Drokin [ 27/Mar/15 ] |
|
So the issue at hand is such that MDS1 crashed (dmesg is empty with signs of reboot). |
| Comment by Olaf Weber [ 30/Mar/15 ] |
|
I may be able to help a bit with the analysis, at least to the extent of answering the question whether my modifications appear to be to blame in this particular case, provided I can get a look at the dmesg output for the crash. (Looking at the actual core would be even better, but I don't know how feasible that would be.) |
| Comment by Oleg Drokin [ 11/Aug/15 ] |
|
Ok, so the lack of logs is due to timestamp in logs. What we see there is 04:22:10:shadow-14vm4 login: [22423.103364] BUG: unable to handle kernel paging request at ffffffffffff8828 04:22:10:[22423.105214] IP: [<ffffffffa08004da>] class_setup+0x63a/0xad0 [obdclass] 04:22:10:[22423.105214] PGD 1a0b067 PUD 1a0c067 PMD 0 04:22:10:[22423.105214] Oops: 0002 [#1] SMP 04:22:10:[22423.105214] CPU 1 04:22:10:[22423.105214] Modules linked in: osp(EN) mdd(EN) lod(EN) mdt(EN) lfsck(EN) mgs(EN) mgc(EN) osd_ldiskfs(EN) lquota(EN) lustre(EN) lov(EN) mdc(EN) fid(EN) lmv(EN) fld(EN) ksocklnd(EN) ptlrpc(EN) obdclass(EN) lnet(EN) libcfs(EN) ldiskfs(EN) sha512_generic sha1_generic md5 crypto_null crc32c quota_v2 quota_tree jbd2 crc16 nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc rdma_ucm rdma_cm iw_cm ib_addr ib_srp scsi_transport_srp scsi_tgt ib_ipoib ib_cm ib_uverbs ib_umad iw_cxgb3 cxgb3 mdio mlx4_en mlx4_ib ib_sa mlx4_core ib_mthca ib_mad ib_core mperf loop dm_mod ipv6 ipv6_lib 8139too floppy virtio_balloon rtc_cmos 8139cp i2c_piix4 mii pcspkr button ttm drm_kms_helper drm i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd usbcore usb_common intel_agp intel_gtt scsi_dh_emc scsi_dh_rdac scsi_dh_alua scsi_dh_hp_sw scsi_dh virtio_pci ata_generic virtio_blk virtio virtio_ring ata_piix edd ext3 mbcache jbd fan processor ahci libahci libata scsi_mod thermal thermal_sys hwmon [last unloaded: libcfs] 04:22:10:[22423.121496] Supported: No, Unsupported modules are loaded 04:22:10:[22423.121496] 04:22:10:[22423.121496] Pid: 31534, comm: llog_process_th Tainted: G EN 3.0.101-0.47.50-default #1 Red Hat KVM 04:22:10:[22423.121496] RIP: 0010:[<ffffffffa08004da>] [<ffffffffa08004da>] class_setup+0x63a/0xad0 [obdclass] 04:22:10:[22423.121496] RSP: 0018:ffff88005d8e9be0 EFLAGS: 00010287 04:22:10:[22423.121496] RAX: 0000000000000007 RBX: ffff880061f14e80 RCX: ffff8800375e3c00 04:22:10:[22423.121496] RDX: 0000000000000006 RSI: ffff8800375e3c00 RDI: 0000000000000286 04:22:10:[22423.121496] RBP: ffff880061f14e80 R08: 000000000000000a R09: 0000000000000010 04:22:10:[22423.121496] R10: 0000ffff0010ff10 R11: 0000000000000000 R12: 0000000000000000 04:22:10:[22423.121496] R13: ffff880061f14fd8 R14: ffffffffffff8800 R15: ffff88005d8e9c10 04:22:10:[22423.121496] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 04:22:10:[22423.121496] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 04:22:10:[22423.121496] CR2: ffffffffffff8828 CR3: 000000007a9a7000 CR4: 00000000000006e0 04:22:10:[22423.121496] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 04:22:10:[22423.121496] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 04:22:10:[22423.121496] Process llog_process_th (pid: 31534, threadinfo ffff88005d8e8000, task ffff88006cf18280) 04:22:10:[22423.121496] Stack: 04:22:10:[22423.121496] ffff880000000800 ffffffffa087c940 551388d200000284 00000000000b2a9a 04:22:10:[22423.121496] 00007b2e00000000 ffff88006fbf4dc0 0000000410000003 0000000000000000 04:22:10:[22423.121496] 0000000000000000 ffff88005d8e9c28 ffff88005d8e9c28 0000000000000082 04:22:10:[22423.121496] Call Trace: 04:22:10:[22423.121496] [<ffffffffa0808605>] class_process_config+0xc95/0x18f0 [obdclass] 04:22:10:[22423.121496] [<ffffffffa080a448>] class_config_llog_handler+0x978/0x14d0 [obdclass] 04:22:10:[22423.121496] [<ffffffffa07ce3cd>] llog_process_thread+0x8bd/0xd10 [obdclass] 04:22:10:[22423.121496] [<ffffffffa07ce85a>] llog_process_thread_daemonize+0x3a/0x70 [obdclass] 04:22:10:[22423.121496] [<ffffffff81083fe6>] kthread+0x96/0xa0 04:22:10:[22423.121496] [<ffffffff8146dce4>] kernel_thread_helper+0x4/0x10 04:22:10:[22423.121496] Code: ff 48 89 44 24 60 49 8b 46 10 ff 10 4c 89 ff 49 89 c6 e8 4a 6b 01 00 49 81 fe 00 f0 ff ff 0f 87 3d 03 00 00 4c 89 b5 b8 00 00 00 04:22:10:[22423.121496] 89 6e 28 e9 7b fe ff ff 0f 1f 44 00 00 c7 05 1e 39 0a 00 20 04:22:10:[22423.121496] RIP [<ffffffffa08004da>] class_setup+0x63a/0xad0 [obdclass] 04:22:10:[22423.121496] RSP <ffff88005d8e9be0> 04:22:10:[22423.121496] CR2: ffffffffffff8828 |
| Comment by Peter Jones [ 11/Aug/15 ] |
|
Yang Sheng Could you please look into this issue? Thanks Peter |
| Comment by Gerrit Updater [ 17/Aug/15 ] |
|
Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/16008 |
| Comment by Gerrit Updater [ 26/Aug/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16008/ |
| Comment by Yang Sheng [ 27/Aug/15 ] |
|
Patch landed. Close ticket. |
| Comment by Joseph Gmitter (Inactive) [ 27/Aug/15 ] |
|
Patch has landed for 2.8 |