[LU-10077] parallel-scale-nfsv4 test_racer_on_nfs: test failed to respond and timed out Created: 04/Oct/17  Updated: 12/Dec/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Casper Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

trevis, full, x86_64 servers, ppc clients
servers: el7.4, ldiskfs, branch master, v2.10.53.1, b3642
clients: el7.4, branch master, v2.10.53.1, b3642


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

https://testing.hpdd.intel.com/test_sessions/ba995751-659c-4e63-9b5b-fbf101137b78

Numerous nfsd hangs in MDS dmesg:

[ 8040.113687] nfsd            D ffff88001d72c980     0 11569      2 0x00000080
[ 8040.116595]  ffff8800512e7c20 0000000000000046 ffff880060d78fd0 ffff8800512e7fd8
[ 8040.119480]  ffff8800512e7fd8 ffff8800512e7fd8 ffff880060d78fd0 ffff88001d72c978
[ 8040.122352]  ffff88001d72c97c ffff880060d78fd0 00000000ffffffff ffff88001d72c980
[ 8040.125151] Call Trace:
[ 8040.127521]  [<ffffffff816aa3c9>] schedule_preempt_disabled+0x29/0x70
[ 8040.130182]  [<ffffffff816a82f7>] __mutex_lock_slowpath+0xc7/0x1d0
[ 8040.132807]  [<ffffffff816a770f>] mutex_lock+0x1f/0x2f
[ 8040.135325]  [<ffffffffc0376c7e>] nfsd4_process_open2+0x1ce/0x1210 [nfsd]
[ 8040.137891]  [<ffffffffc03540db>] ? fh_verify+0x16b/0x5f0 [nfsd]
[ 8040.140408]  [<ffffffffc03739b9>] ? nfs4_alloc_stid+0x59/0xb0 [nfsd]
[ 8040.142897]  [<ffffffffc0365182>] nfsd4_open+0x542/0x830 [nfsd]
[ 8040.145317]  [<ffffffffc0365845>] nfsd4_proc_compound+0x3d5/0x770 [nfsd]
[ 8040.147794]  [<ffffffffc0350593>] nfsd_dispatch+0xd3/0x280 [nfsd]
[ 8040.150257]  [<ffffffffc0257453>] svc_process_common+0x453/0x6f0 [sunrpc]
[ 8040.152659]  [<ffffffffc02577f3>] svc_process+0x103/0x190 [sunrpc]
[ 8040.155017]  [<ffffffffc034feff>] nfsd+0xdf/0x150 [nfsd]
[ 8040.157259]  [<ffffffffc034fe20>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 8040.159532]  [<ffffffff810b098f>] kthread+0xcf/0xe0
[ 8040.161673]  [<ffffffff8108ddeb>] ? do_exit+0x6bb/0xa40
[ 8040.163798]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[ 8040.165955]  [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
[ 8040.168022]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[ 8040.170107] INFO: task nfsd:11570 blocked for more than 120 seconds.


 Comments   
Comment by James Casper [ 22/Nov/17 ]

This is also seen with x86_64 clients in the mix:

https://testing.hpdd.intel.com/test_sets/2b1c4958-c58b-11e7-a066-52540065bddc
https://testing.hpdd.intel.com/test_sets/6ed23a40-c9b4-11e7-9c63-52540065bddc
https://testing.hpdd.intel.com/test_sets/6d92957c-c576-11e7-a066-52540065bddc

Generated at Sat Feb 10 02:31:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.