[LU-1913] Test failure on test suite performance-sanity, subtest test_3 Created: 12/Sep/12 Updated: 29/May/17 Resolved: 29/May/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 1 |
| Labels: | None | ||
| Environment: |
server: b2_3/build#16 RHEL6 |
||
| Severity: | 3 |
| Rank (Obsolete): | 10289 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/693d643e-fa51-11e1-887d-52540035b04c. The sub-test test_3 failed with the following error:
MDS console log shows 23:02:04:Lustre: DEBUG MARKER: /usr/sbin/lctl mark ===== mdsrate-create-small.sh ### 2 NODES CREATE with 3 threads per client ### 23:02:04:Lustre: DEBUG MARKER: ===== mdsrate-create-small.sh 23:02:04:BUG: unable to handle kernel paging request at 0000000781bbc060 23:02:04:IP: [<ffffffff8104f1c7>] resched_task+0x17/0x80 23:02:04:PGD 374e5067 PUD 0 23:02:04:Thread overran stack, or stack corrupted 23:02:04:Oops: 0000 [#1] SMP 23:02:04:last sysfs file: /sys/devices/system/cpu/possible 23:02:04:CPU 0 23:02:04:Modules linked in: osd_ldiskfs(U) fsfilt_ldiskfs(U) ldiskfs(U) lustre(U) obdfilter(U) ost(U) cmm(U) mdt(U) mdd(U) mds(U) mgs(U) obdecho(U) mgc(U) lquota(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) sha512_generic sha256_generic jbd2 nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs] 23:02:04: 23:02:04:Pid: 21654, comm: mdt00_001 Not tainted 2.6.32-279.5.1.el6_lustre.x86_64 #1 Red Hat KVM 23:02:04:RIP: 0010:[<ffffffff8104f1c7>] [<ffffffff8104f1c7>] resched_task+0x17/0x80 23:02:04:RSP: 0018:ffff880002203de8 EFLAGS: 00010087 23:02:04:RAX: 00000000000166c0 RBX: ffff880068462b18 RCX: 00000000ffff8800 23:02:04:RDX: ffff880031c2a000 RSI: 0000000000000400 RDI: ffff880068462ae0 23:02:04:RBP: ffff880002203de8 R08: 0000000000989680 R09: 0000000000000000 23:02:04:R10: 0000000000000010 R11: 0000000000000000 R12: ffff880002216728 23:02:04:R13: 0000000000000000 R14: 0000000000000000 R15: ffff880068462ae0 23:02:04:FS: 00007f0480a26700(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 23:02:04:CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 23:02:04:CR2: 0000000781bbc060 CR3: 000000007089b000 CR4: 00000000000006f0 23:02:04:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 23:02:04:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 23:02:04:Process mdt00_001 (pid: 21654, threadinfo ffff880031c2a000, task ffff880068462ae0) 23:02:04:Stack: 23:02:04: ffff880002203e18 ffffffff8105484c ffff8800022166c0 0000000000000000 23:02:04:<d> 00000000000166c0 0000000000000000 ffff880002203e58 ffffffff81057fa1 23:02:04:<d> ffff880002203e58 ffff880068462ae0 0000000000000000 0000000000000000 23:02:04:Call Trace: 23:02:04: <IRQ> 23:02:04: [<ffffffff8105484c>] task_tick_fair+0x14c/0x160 |
| Comments |
| Comment by Keith Mannthey (Inactive) [ 12/Sep/12 ] |
|
A dmesg or /var/log/messages from the MDS and both the clients would be really handy. client-27vm5 was client 1 but there is not sign of it in the logs. Do we know what happened to it? It reboot and was lost? It looks like client-27-vm5 (and the other client) were trying to load an IB driver open_hca (seems odd for vms to do this). With a PUD of 0 and no PMD /PTE I would say really asked for some non existent address rather than a valid address that was somehow hw junked. With "Thread overran stack, or stack corrupted" we very may have had some sort of a stack overflow. |
| Comment by Andreas Dilger [ 29/May/17 ] |
|
Close old ticket. |