[LU-1871] MDS oops in mdsrate-create-small.sh: Thread overran stack, or stack corrupted Created: 23/Aug/12  Updated: 12/Sep/12  Resolved: 12/Sep/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Sarah Liu Assignee: Lai Siyao
Resolution: Duplicate Votes: 0
Labels: releases
Environment:

server/client: lustre-b2_3/build #1/RHEL6


Severity: 3
Rank (Obsolete): 2219

 Description   

I think this is a MPI related issue
test log from client 1:

[[55105,1],10]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: client-10vm1.lab.whamcloud.com

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
CMA: no RDMA devices found
CMA: no RDMA devices found
CMA: no RDMA devices found
CMA: no RDMA devices found
CMA: no RDMA devices found
CMA: no RDMA devices found
CMA: no RDMA devices found
CMA: no RDMA devices found
CMA: no RDMA devices found
CMA: no RDMA devices found
CMA: no RDMA devices found
CMA: no RDMA devices found
CMA: no RDMA devices found
CMA: no RDMA devices found
CMA: no RDMA devices found
CMA: no RDMA devices found
r= 0: create /mnt/lustre/d0.write_append_truncate/f0.wat, max size: 3703701, seed 1345680120: No such file or directory
r= 0 l=0000: WR A  203927/0x031c97, AP a  157830/0x026886, TR@  308317/0x04b45d
[client-10vm1.lab.whamcloud.com:12004] 15 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[client-10vm1.lab.whamcloud.com:12004] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
r= 0 l=1000: WR M  391981/0x05fb2d, AP m  363671/0x058c97, TR@  495994/0x07917a


 Comments   
Comment by Minh Diep [ 10/Sep/12 ]

are there more logs, dmesg, console? The test should not fail even with these message

Comment by Chris Gearing (Inactive) [ 10/Sep/12 ]

From Minh's comments this seems like a lustre test issue. The lustre test needs to be fixed to appropriately fallback to a slow device.

It is very hard to be sure because no link to failing tests was provided, it's not even possible to know which test was running.

Comment by Minh Diep [ 10/Sep/12 ]

the 'lower performance' can be ignored: http://cac.engin.umich.edu/faq.html
May be this will help http://www.open-mpi.org/community/lists/users/2012/02/18465.php

Comment by Minh Diep [ 10/Sep/12 ]

what has changed was:

before: + su mpiuser sh -c "/usr/lib64/openmpi/bin/mpirun -mca boot ssh -mca btl tcp,self -np 1 -machinefile /tmp/mdsrate-create-small.machines /usr/lib64/lustre/tests/mdsrate --create --time 600 --nfiles 129674 --dir /mnt/lustre/mdsrate/single --filefmt 'f%%d' "

today: + su mpiuser sh -c "/usr/lib64/openmpi/bin/mpirun -mca boot ssh -np 6 -machinefile /tmp/mdsrate-create-small.machines /usr/lib64/lustre/tests/mdsrate --create --time 600 --nfiles 128386 --dir /mnt/lustre/mdsrate/multi --filefmt 'f%%d' "

we removed -mca btl tcp,self. However adding those just to suppress the message, not causing any issue.

If I recall correctly, we removed those because of running on IB issue. Perhaps, we need to check if we run on ib or tcp and put appropriate options.

Comment by Minh Diep [ 10/Sep/12 ]

this test passed a couple days before: https://maloo.whamcloud.com/test_sets/506f3352-fa9b-11e1-887d-52540035b04c
start failing on 9/9 https://maloo.whamcloud.com/test_sets/693d643e-fa51-11e1-887d-52540035b04c

client show that mdsrate was hung

20:46:53:Lustre: 8471:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
20:47:24:INFO: task mdsrate:21390 blocked for more than 120 seconds.
20:47:24:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
20:47:24:mdsrate D 0000000000000000 0 21390 21387 0x00000080
20:47:24: ffff88004c1d7be8 0000000000000082 ffff88000000012d ffffffff0000012d
20:47:24: ffff880000000065 ffffffffa0b404a0 000000000000012d ffffffffa0715cc5
20:47:24: ffff880048d5c638 ffff88004c1d7fd8 000000000000fb88 ffff880048d5c638
20:47:24:Call Trace:
20:47:24: [<ffffffffa06c042d>] ? lustre_msg_buf+0x5d/0x60 [ptlrpc]
20:47:24: [<ffffffffa06ed4a6>] ? __req_capsule_get+0x176/0x750 [ptlrpc]
20:47:24: [<ffffffff814fefbe>] __mutex_lock_slowpath+0x13e/0x180
20:47:24: [<ffffffff814fee5b>] mutex_lock+0x2b/0x50
20:47:24: [<ffffffffa06609b3>] mdc_close+0x193/0x9a0 [mdc]
20:47:24: [<ffffffffa061b8c6>] lmv_close+0x2d6/0x5a0 [lmv]
20:47:24: [<ffffffffa099c7af>] ll_close_inode_openhandle+0x30f/0x1050 [lustre]
20:47:24: [<ffffffffa099d69a>] ll_md_real_close+0x1aa/0x220 [lustre]
20:47:24: [<ffffffffa099d96b>] ll_md_close+0x25b/0x760 [lustre]
20:47:24: [<ffffffff814ff546>] ? down_read+0x16/0x30
20:47:24: [<ffffffffa099df8b>] ll_file_release+0x11b/0x3e0 [lustre]
20:47:24: [<ffffffff8117ca65>] __fput+0xf5/0x210
20:47:24: [<ffffffff8117cba5>] fput+0x25/0x30
20:47:24: [<ffffffff811785cd>] filp_close+0x5d/0x90
20:47:24: [<ffffffff811786a5>] sys_close+0xa5/0x100
20:47:24: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
20:48:15:Lustre: 8471:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1347162469/real 1347162469] req@ffff88007a52dc00 x1412601028282197/t0(0) o250->MGC10.10.4.170@tcp@10.10.4.170@tcp:26/25 lens 400/544 e 0 to 1 dl 1347162494 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
20:48:15:Lustre: 8471:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 7 previous similar messages
20:49:26:INFO: task mdsrate:21390 blocked for more than 120 seconds.
20:49:26:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

perhaps it related to this Oops on the mds
https://maloo.whamcloud.com/test_sets/693d643e-fa51-11e1-887d-52540035b04c
https://maloo.whamcloud.com/test_logs/6a1db8f4-fa51-11e1-887d-52540035b04c/show_text

23:02:04:Lustre: DEBUG MARKER: ===== mdsrate-create-small.sh
23:02:04:BUG: unable to handle kernel paging request at 0000000781bbc060
23:02:04:IP: [<ffffffff8104f1c7>] resched_task+0x17/0x80
23:02:04:PGD 374e5067 PUD 0
23:02:04:Thread overran stack, or stack corrupted
23:02:04:Oops: 0000 1 SMP
23:02:04:last sysfs file: /sys/devices/system/cpu/possible
23:02:04:CPU 0
23:02:04:Modules linked in: osd_ldiskfs(U) fsfilt_ldiskfs(U) ldiskfs(U) lustre(U) obdfilter(U) ost(U) cmm(U) mdt(U) mdd(U) mds(U) mgs(U) obdecho(U) mgc(U) lquota(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) sha512_generic sha256_generic jbd2 nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
23:02:04:
23:02:04:Pid: 21654, comm: mdt00_001 Not tainted 2.6.32-279.5.1.el6_lustre.x86_64 #1 Red Hat KVM
23:02:04:RIP: 0010:[<ffffffff8104f1c7>] [<ffffffff8104f1c7>] resched_task+0x17/0x80
23:02:04:RSP: 0018:ffff880002203de8 EFLAGS: 00010087
23:02:04:RAX: 00000000000166c0 RBX: ffff880068462b18 RCX: 00000000ffff8800
23:02:04:RDX: ffff880031c2a000 RSI: 0000000000000400 RDI: ffff880068462ae0
23:02:04:RBP: ffff880002203de8 R08: 0000000000989680 R09: 0000000000000000
23:02:04:R10: 0000000000000010 R11: 0000000000000000 R12: ffff880002216728
23:02:04:R13: 0000000000000000 R14: 0000000000000000 R15: ffff880068462ae0
23:02:04:FS: 00007f0480a26700(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
23:02:04:CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
23:02:04:CR2: 0000000781bbc060 CR3: 000000007089b000 CR4: 00000000000006f0
23:02:04:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
23:02:04:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
23:02:04:Process mdt00_001 (pid: 21654, threadinfo ffff880031c2a000, task ffff880068462ae0)
23:02:04:Stack:
23:02:04: ffff880002203e18 ffffffff8105484c ffff8800022166c0 0000000000000000
23:02:04:<d> 00000000000166c0 0000000000000000 ffff880002203e58 ffffffff81057fa1
23:02:04:<d> ffff880002203e58 ffff880068462ae0 0000000000000000 0000000000000000
23:02:04:Call Trace:
23:02:04: <IRQ>

Comment by Andreas Dilger [ 10/Sep/12 ]

Need to run "checkstack" on the current master, as well as an older Lustre (maybe b2_1) so that we can see which functions have gotten a larger stack usage. Unfortunately, there is no stack trace that shows what the callpath is.

Comment by Peter Jones [ 10/Sep/12 ]

Lai

Could you please look into this one?

Thanks

Peter

Comment by Lai Siyao [ 11/Sep/12 ]

I reproduced once locally, but the result doesn't print backtrace either, I'll try to reproduce again and find more useful information before MDS oops.

Comment by Lai Siyao [ 12/Sep/12 ]

This is a duplicate of LU-1881, and this test can pass with the patch for 1881.

Generated at Sat Feb 10 06:11:03 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.