Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15744

sanity-quota test_3a: ldlm_lockd.c:719:ldlm_handle_ast_error()) ### client (nid 10.240.42.19@tcp) returned error from blocking AST

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Cliff White <cwhite@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/04d311dc-2cdd-41d7-b466-153348f0b7ce

      System appears to go bad somewhat prior to the actual failing test. Logs show this:

      [29069.300962] Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-quota ============----- Mon Apr 4 01:48:40 UTC 2022
      [29071.296226] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
      [29072.105819] Lustre: DEBUG MARKER: /usr/sbin/lctl mark excepting tests: 55
      [29072.142356] LustreError: 1038024:0:(ldlm_lockd.c:719:ldlm_handle_ast_error()) ### client (nid 10.240.42.19@tcp) returned error from blocking AST (req@000000002456a2c9 x1729110315565056 status -107 rc -107), evict it ns: filter-lustre-OST0005_UUID lock: 00000000f3771d86/0xceaec102ff073de4 lrc: 4/0,0 mode: PW/PW res: [0x2e4b0:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) gid 0 flags: 0x60000400030020 nid: 10.240.42.19@tcp remote: 0x4d8ebe6f641e09a9 expref: 61 pid: 1038030 timeout: 29173 lvb_type: 0
      [29072.142945] LustreError: 138-a: lustre-OST0003: A client on nid 10.240.42.19@tcp was evicted due to a lock blocking callback time out: rc -107
      [29072.150901] LustreError: 1038024:0:(ldlm_lockd.c:719:ldlm_handle_ast_error()) Skipped 1 previous similar message
      [29072.155542] LustreError: 945061:0:(ldlm_lockd.c:259:expired_lock_main()) ### lock callback timer expired after 0s: evicting client at 10.240.42.19@tcp  ns: filter-lustre-OST0004_UUID lock: 0000000060efdb4c/0xceaec102ff073b91 lrc: 3/0,0 mode: PW/PW res: [0x2e58c:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) gid 0 flags: 0x60000400030020 nid: 10.240.42.19@tcp remote: 0x4d8ebe6f641dd3b6 expref: 62 pid: 1038016 timeout: 0 lvb_type: 0
      

      Following tests fail, many dropped connections:

      [ 1246.301471] Lustre: 34254:0:(client.c:2282:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1649008457/real 1649008457]  req@0000000026f25ab6 x1729110135806080/t0(0) o400->MGC10.240.42.19@tcp@10.240.42.19@tcp:26/25 lens 224/224 e 0 to 1 dl 1649008464 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker/u4:4.0'
      [ 1246.306932] LustreError: 166-1: MGC10.240.42.19@tcp: Connection to MGS (at 10.240.42.19@tcp) was lost; in progress operations using this service will fail
      [ 1263.645767] Lustre: Evicted from MGS (at 10.240.42.19@tcp) after server handle changed from 0x4d8ebe6f5a182d6e to 0x4d8ebe6f5a18f5a4
      [ 1263.648365] Lustre: MGC10.240.42.19@tcp: Connection restored to 10.240.42.19@tcp (at 10.240.42.19@tcp)
      

      Sanity-quota was last test failure:

      [31214.520365] LustreError: 110057:0:(lcommon_cl.c:197:cl_file_inode_init()) lustre: failed to initialize cl_object [0x20000a811:0x2496:0x0]: rc = -22
      [31214.522974] LustreError: 110057:0:(llite_lib.c:2837:ll_prep_inode()) new_inode -fatal: rc -22
      [31216.139993] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-quota test_3a: @@@@@@ FAIL: write success, but expect EDQUOT 
      [31216.590783] Lustre: DEBUG MARKER: sanity-quota test_3a: @@@@@@ FAIL: write success, but expect EDQUOT
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: