[LU-1913] Test failure on test suite performance-sanity, subtest test_3 Created: 12/Sep/12  Updated: 29/May/17  Resolved: 29/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 1
Labels: None
Environment:

server: b2_3/build#16 RHEL6
client: b2_3/build#16 SLES11


Severity: 3
Rank (Obsolete): 10289

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/693d643e-fa51-11e1-887d-52540035b04c.

The sub-test test_3 failed with the following error:

test failed to respond and timed out

MDS console log shows

23:02:04:Lustre: DEBUG MARKER: /usr/sbin/lctl mark ===== mdsrate-create-small.sh ### 2 NODES CREATE with 3 threads per client ###
23:02:04:Lustre: DEBUG MARKER: ===== mdsrate-create-small.sh
23:02:04:BUG: unable to handle kernel paging request at 0000000781bbc060
23:02:04:IP: [<ffffffff8104f1c7>] resched_task+0x17/0x80
23:02:04:PGD 374e5067 PUD 0 
23:02:04:Thread overran stack, or stack corrupted
23:02:04:Oops: 0000 [#1] SMP 
23:02:04:last sysfs file: /sys/devices/system/cpu/possible
23:02:04:CPU 0 
23:02:04:Modules linked in: osd_ldiskfs(U) fsfilt_ldiskfs(U) ldiskfs(U) lustre(U) obdfilter(U) ost(U) cmm(U) mdt(U) mdd(U) mds(U) mgs(U) obdecho(U) mgc(U) lquota(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) sha512_generic sha256_generic jbd2 nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
23:02:04:
23:02:04:Pid: 21654, comm: mdt00_001 Not tainted 2.6.32-279.5.1.el6_lustre.x86_64 #1 Red Hat KVM
23:02:04:RIP: 0010:[<ffffffff8104f1c7>]  [<ffffffff8104f1c7>] resched_task+0x17/0x80
23:02:04:RSP: 0018:ffff880002203de8  EFLAGS: 00010087
23:02:04:RAX: 00000000000166c0 RBX: ffff880068462b18 RCX: 00000000ffff8800
23:02:04:RDX: ffff880031c2a000 RSI: 0000000000000400 RDI: ffff880068462ae0
23:02:04:RBP: ffff880002203de8 R08: 0000000000989680 R09: 0000000000000000
23:02:04:R10: 0000000000000010 R11: 0000000000000000 R12: ffff880002216728
23:02:04:R13: 0000000000000000 R14: 0000000000000000 R15: ffff880068462ae0
23:02:04:FS:  00007f0480a26700(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
23:02:04:CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
23:02:04:CR2: 0000000781bbc060 CR3: 000000007089b000 CR4: 00000000000006f0
23:02:04:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
23:02:04:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
23:02:04:Process mdt00_001 (pid: 21654, threadinfo ffff880031c2a000, task ffff880068462ae0)
23:02:04:Stack:
23:02:04: ffff880002203e18 ffffffff8105484c ffff8800022166c0 0000000000000000
23:02:04:<d> 00000000000166c0 0000000000000000 ffff880002203e58 ffffffff81057fa1
23:02:04:<d> ffff880002203e58 ffff880068462ae0 0000000000000000 0000000000000000
23:02:04:Call Trace:
23:02:04: <IRQ> 
23:02:04: [<ffffffff8105484c>] task_tick_fair+0x14c/0x160


 Comments   
Comment by Keith Mannthey (Inactive) [ 12/Sep/12 ]

A dmesg or /var/log/messages from the MDS and both the clients would be really handy.

client-27vm5 was client 1 but there is not sign of it in the logs. Do we know what happened to it? It reboot and was lost?

It looks like client-27-vm5 (and the other client) were trying to load an IB driver open_hca (seems odd for vms to do this).

With a PUD of 0 and no PMD /PTE I would say really asked for some non existent address rather than a valid address that was somehow hw junked. With "Thread overran stack, or stack corrupted" we very may have had some sort of a stack overflow.

Comment by Andreas Dilger [ 29/May/17 ]

Close old ticket.

Generated at Sat Feb 10 01:20:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.