[LU-15744] sanity-quota test_3a: ldlm_lockd.c:719:ldlm_handle_ast_error()) ### client (nid 10.240.42.19@tcp) returned error from blocking AST Created: 14/Apr/22  Updated: 21/Dec/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-13644 sanity-quota test_3a fails with 'writ... Open
is related to LU-14387 sanity-quota tests fail with “lfs: fa... Open
is related to LU-14279 sanity-quota test_3b: write success, ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Cliff White <cwhite@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/04d311dc-2cdd-41d7-b466-153348f0b7ce

System appears to go bad somewhat prior to the actual failing test. Logs show this:

[29069.300962] Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-quota ============----- Mon Apr 4 01:48:40 UTC 2022
[29071.296226] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
[29072.105819] Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests: 55
[29072.142356] LustreError: 1038024:0:(ldlm_lockd.c:719:ldlm_handle_ast_error()) ### client (nid 10.240.42.19@tcp) returned error from blocking AST (req@000000002456a2c9 x1729110315565056 status -107 rc -107), evict it ns: filter-lustre-OST0005_UUID lock: 00000000f3771d86/0xceaec102ff073de4 lrc: 4/0,0 mode: PW/PW res: [0x2e4b0:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) gid 0 flags: 0x60000400030020 nid: 10.240.42.19@tcp remote: 0x4d8ebe6f641e09a9 expref: 61 pid: 1038030 timeout: 29173 lvb_type: 0
[29072.142945] LustreError: 138-a: lustre-OST0003: A client on nid 10.240.42.19@tcp was evicted due to a lock blocking callback time out: rc -107
[29072.150901] LustreError: 1038024:0:(ldlm_lockd.c:719:ldlm_handle_ast_error()) Skipped 1 previous similar message
[29072.155542] LustreError: 945061:0:(ldlm_lockd.c:259:expired_lock_main()) ### lock callback timer expired after 0s: evicting client at 10.240.42.19@tcp  ns: filter-lustre-OST0004_UUID lock: 0000000060efdb4c/0xceaec102ff073b91 lrc: 3/0,0 mode: PW/PW res: [0x2e58c:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) gid 0 flags: 0x60000400030020 nid: 10.240.42.19@tcp remote: 0x4d8ebe6f641dd3b6 expref: 62 pid: 1038016 timeout: 0 lvb_type: 0

Following tests fail, many dropped connections:

[ 1246.301471] Lustre: 34254:0:(client.c:2282:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1649008457/real 1649008457]  req@0000000026f25ab6 x1729110135806080/t0(0) o400->MGC10.240.42.19@tcp@10.240.42.19@tcp:26/25 lens 224/224 e 0 to 1 dl 1649008464 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker/u4:4.0'
[ 1246.306932] LustreError: 166-1: MGC10.240.42.19@tcp: Connection to MGS (at 10.240.42.19@tcp) was lost; in progress operations using this service will fail
[ 1263.645767] Lustre: Evicted from MGS (at 10.240.42.19@tcp) after server handle changed from 0x4d8ebe6f5a182d6e to 0x4d8ebe6f5a18f5a4
[ 1263.648365] Lustre: MGC10.240.42.19@tcp: Connection restored to 10.240.42.19@tcp (at 10.240.42.19@tcp)

Sanity-quota was last test failure:

[31214.520365] LustreError: 110057:0:(lcommon_cl.c:197:cl_file_inode_init()) lustre: failed to initialize cl_object [0x20000a811:0x2496:0x0]: rc = -22
[31214.522974] LustreError: 110057:0:(llite_lib.c:2837:ll_prep_inode()) new_inode -fatal: rc -22
[31216.139993] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-quota test_3a: @@@@@@ FAIL: write success, but expect EDQUOT 
[31216.590783] Lustre: DEBUG MARKER: sanity-quota test_3a: @@@@@@ FAIL: write success, but expect EDQUOT


 Comments   
Comment by Patrick Farrell [ 18/Apr/22 ]

Just a note to focus investigations - As Cliff suggests, the issue here is the eviction.  The quota stuff all appears to be fallout.  The eviction is from an earlier test and may not be from sanity-quota at all.

Generated at Sat Feb 10 03:20:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.