Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Chris Horn <hornc@cray.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/266b7a77-975d-473d-963d-f78a22e7209f
test_1 failed with the following error:
Timeout occurred after 65 mins, last suite running was runtests
Maloo tagged this failure as LU-13065, but it looks different to me.
The last lines in suite log show us unmount clients, not OSTs as in LU-13065:
echo Stopping client \$(hostname) /mnt/lustre opts:;
lsof /mnt/lustre || need_kill=no;
if [ x != x -a x\$need_kill != xno ]; then
pids=\$(lsof -t /mnt/lustre | sort -u);
if [ -n \"\$pids\" ]; then
kill -9 \$pids;
fi
fi;
while umount /mnt/lustre 2>&1 | grep -q busy; do
echo /mnt/lustre is still busy, wait one second && sleep 1;
done;
fi
console log for client 2 (trevis-53vm2) shows hung mount.nfs tasks:
[ 720.526224] INFO: task mount.nfs:1314 blocked for more than 120 seconds. [ 720.527710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 720.529043] mount.nfs D ffff8e4c7bf2a0e0 0 1314 1313 0x00000080 [ 720.530429] Call Trace: [ 720.530899] [] schedule+0x29/0x70 [ 720.531762] [] schedule_timeout+0x221/0x2d0 [ 720.533040] [] ? __slab_free+0x1b0/0x290 [ 720.534069] [] ? svc_close_list+0x88/0xa0 [sunrpc] [ 720.535194] [] wait_for_completion+0xfd/0x140 [ 720.536316] [] ? wake_up_state+0x20/0x20 [ 720.537321] [] kthread_stop+0x4a/0xf0 [ 720.538291] [] nfs_callback_down+0x53/0xd0 [nfsv4] [ 720.539452] [] nfs4_free_client+0x4f/0xc0 [nfsv4] [ 720.540600] [] nfs_put_client+0xf2/0x140 [nfs] [ 720.541706] [] nfs4_init_client+0x193/0x2f0 [nfsv4] [ 720.542908] [] ? kmem_cache_alloc+0x35/0x1f0 [ 720.543984] [] ? __fscache_acquire_cookie+0x66/0x180 [fscache] [ 720.545309] [] ? __fscache_acquire_cookie+0x66/0x180 [fscache] [ 720.546652] [] ? __rpc_init_priority_wait_queue+0x81/0xc0 [sunrpc] [ 720.548038] [] ? rpc_init_wait_queue+0x13/0x20 [sunrpc] [ 720.549251] [] nfs_get_client+0x353/0x470 [nfs] [ 720.550352] [] nfs4_set_client+0x9c/0x150 [nfsv4] [ 720.551492] [] nfs4_create_server+0x13e/0x3b0 [nfsv4] [ 720.552722] [] nfs4_remote_mount+0x2e/0x60 [nfsv4] [ 720.553887] [] mount_fs+0x3e/0x1b0 [ 720.554860] [] ? __alloc_percpu+0x15/0x20 [ 720.555900] [] vfs_kern_mount+0x67/0x110 [ 720.556909] [] nfs_do_root_mount+0x86/0xc0 [nfsv4] [ 720.558073] [] nfs4_try_mount+0x44/0xc0 [nfsv4] [ 720.559168] [] ? get_nfs_version+0x27/0x90 [nfs] [ 720.560301] [] nfs_fs_mount+0x4cb/0xdc0 [nfs] [ 720.561402] [] ? nfs_clone_super+0x140/0x140 [nfs] [ 720.562568] [] ? param_set_portnr+0x70/0x70 [nfs] [ 720.563682] [] mount_fs+0x3e/0x1b0 [ 720.564627] [] ? __alloc_percpu+0x15/0x20 [ 720.565694] [] vfs_kern_mount+0x67/0x110 [ 720.566695] [] do_mount+0x1ef/0xce0 [ 720.567627] [] ? copy_mount_options+0xc0/0x170 [ 720.568725] [] SyS_mount+0x83/0xd0 [ 720.569643] [] system_call_fastpath+0x25/0x2a [ 720.570704] [] ? system_call_after_swapgs+0xae/0x146
At a quick glance, I don't see anything wrong with Lustre
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
runtests test_1 - Timeout occurred after 65 mins, last suite running was runtests