[LU-13531] runtests test_1: Timeout occurred after 65 mins, last suite running was runtests Created: 07/May/20  Updated: 27/Mar/23  Resolved: 27/Mar/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Chris Horn <hornc@cray.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/266b7a77-975d-473d-963d-f78a22e7209f

test_1 failed with the following error:

Timeout occurred after 65 mins, last suite running was runtests

Maloo tagged this failure as LU-13065, but it looks different to me.

The last lines in suite log show us unmount clients, not OSTs as in LU-13065:

echo Stopping client \$(hostname) /mnt/lustre opts:;
lsof /mnt/lustre || need_kill=no;
if [ x != x -a x\$need_kill != xno ]; then
    pids=\$(lsof -t /mnt/lustre | sort -u);
    if [ -n \"\$pids\" ]; then
             kill -9 \$pids;
    fi
fi;
while umount  /mnt/lustre 2>&1 | grep -q busy; do
    echo /mnt/lustre is still busy, wait one second && sleep 1;
done;
fi

console log for client 2 (trevis-53vm2) shows hung mount.nfs tasks:

[  720.526224] INFO: task mount.nfs:1314 blocked for more than 120 seconds.
[  720.527710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  720.529043] mount.nfs       D ffff8e4c7bf2a0e0     0  1314   1313 0x00000080
[  720.530429] Call Trace:
[  720.530899]  [] schedule+0x29/0x70
[  720.531762]  [] schedule_timeout+0x221/0x2d0
[  720.533040]  [] ? __slab_free+0x1b0/0x290
[  720.534069]  [] ? svc_close_list+0x88/0xa0 [sunrpc]
[  720.535194]  [] wait_for_completion+0xfd/0x140
[  720.536316]  [] ? wake_up_state+0x20/0x20
[  720.537321]  [] kthread_stop+0x4a/0xf0
[  720.538291]  [] nfs_callback_down+0x53/0xd0 [nfsv4]
[  720.539452]  [] nfs4_free_client+0x4f/0xc0 [nfsv4]
[  720.540600]  [] nfs_put_client+0xf2/0x140 [nfs]
[  720.541706]  [] nfs4_init_client+0x193/0x2f0 [nfsv4]
[  720.542908]  [] ? kmem_cache_alloc+0x35/0x1f0
[  720.543984]  [] ? __fscache_acquire_cookie+0x66/0x180 [fscache]
[  720.545309]  [] ? __fscache_acquire_cookie+0x66/0x180 [fscache]
[  720.546652]  [] ? __rpc_init_priority_wait_queue+0x81/0xc0 [sunrpc]
[  720.548038]  [] ? rpc_init_wait_queue+0x13/0x20 [sunrpc]
[  720.549251]  [] nfs_get_client+0x353/0x470 [nfs]
[  720.550352]  [] nfs4_set_client+0x9c/0x150 [nfsv4]
[  720.551492]  [] nfs4_create_server+0x13e/0x3b0 [nfsv4]
[  720.552722]  [] nfs4_remote_mount+0x2e/0x60 [nfsv4]
[  720.553887]  [] mount_fs+0x3e/0x1b0
[  720.554860]  [] ? __alloc_percpu+0x15/0x20
[  720.555900]  [] vfs_kern_mount+0x67/0x110
[  720.556909]  [] nfs_do_root_mount+0x86/0xc0 [nfsv4]
[  720.558073]  [] nfs4_try_mount+0x44/0xc0 [nfsv4]
[  720.559168]  [] ? get_nfs_version+0x27/0x90 [nfs]
[  720.560301]  [] nfs_fs_mount+0x4cb/0xdc0 [nfs]
[  720.561402]  [] ? nfs_clone_super+0x140/0x140 [nfs]
[  720.562568]  [] ? param_set_portnr+0x70/0x70 [nfs]
[  720.563682]  [] mount_fs+0x3e/0x1b0
[  720.564627]  [] ? __alloc_percpu+0x15/0x20
[  720.565694]  [] vfs_kern_mount+0x67/0x110
[  720.566695]  [] do_mount+0x1ef/0xce0
[  720.567627]  [] ? copy_mount_options+0xc0/0x170
[  720.568725]  [] SyS_mount+0x83/0xd0
[  720.569643]  [] system_call_fastpath+0x25/0x2a
[  720.570704]  [] ? system_call_after_swapgs+0xae/0x146

At a quick glance, I don't see anything wrong with Lustre

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
runtests test_1 - Timeout occurred after 65 mins, last suite running was runtests


Generated at Sat Feb 10 03:02:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.