Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15846

sanity-quota interop fail: set_param: setting debug=+quota,trace: Invalid argument

Details

    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Cliff White <cwhite@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/f26e6614-1b8b-4b79-a70f-ee4ce7252e9f

      Client appears to have been evicted at start or prior to start of test.

      [11019.259413] Lustre: DEBUG MARKER: == rpc test complete, duration -o sec ================================================================ 02:16:49 (1651976209)
      [11019.601614] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
      [11020.000907] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
      [11020.920108] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-39vm2.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
      [11021.156356] Lustre: DEBUG MARKER: trevis-39vm2.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
      [11021.251916] LustreError: 7705:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 0@lo) returned error from blocking AST (req@ffff971c467da880 x1732211603062336 status -107 rc -107), evict it ns: mdt-lustre-MDT0000_UUID lock: ffff971ccc36d680/0x1c652b3f19039ba8 lrc: 4/0,0 mode: PR/PR res: [0x20001a9e2:0x5a6a:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 0@lo remote: 0x1c652b3f19039b9a expref: 934 pid: 10604 timeout: 11121 lvb_type: 0
      [11021.259068] LustreError: 138-a: lustre-MDT0000: A client on nid 0@lo was evicted due to a lock blocking callback time out: rc -107
      [11021.261382] LustreError: 10594:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 0s: evicting client at 0@lo  ns: mdt-lustre-MDT0000_UUID lock: ffff971ccc36d680/0x1c652b3f19039ba8 lrc: 3/0,0 mode: PR/PR res: [0x20001a9e2:0x5a6a:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 0@lo remote: 0x1c652b3f19039b9a expref: 935 pid: 10604 timeout: 0 lvb_type: 0
      [11021.369209] LustreError: 7705:0:(client.c:1213:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff971c467da880 x1732211603062400/t0(0) o104->lustre-MDT0000@0@lo:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      [11022.532353] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.mdt=ug
      [11022.923428] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=ug
      [11023.418437] Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: sanity-quota ============----- Sun May  8 02:16:53 UTC 2022
      [11023.673072] Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-quota ============----- Sun May 8 02:16:53 UTC 2022
      
      [11022.550571] LustreError: 17675:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.240.39.127@tcp) returned error from blocking AST (req@ffff8a9638ad8d80 x1732211273986240 status -107 rc -107), evict it ns: filter-lustre-OST0002_UUID lock: ffff8a95b6db2240/0x1ad8c47013a5028e lrc: 4/0,0 mode: PW/PW res: [0x161ba:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000400010020 nid: 10.240.39.127@tcp remote: 0x1c652b3f190530c1 expref: 63 pid: 12899 timeout: 11122 lvb_type: 0
      [11022.558744] LustreError: 138-a: lustre-OST0002: A client on nid 10.240.39.127@tcp was evicted due to a lock blocking callback time out: rc -107
      [11022.563250] LustreError: 20721:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 0s: evicting client at 10.240.39.127@tcp  ns: filter-lustre-OST0003_UUID lock: ffff8a960f060b40/0x1ad8c47013a4c8f8 lrc: 3/0,0 mode: PW/PW res: [0x1615e:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000400030020 nid: 10.240.39.127@tcp remote: 0x1c652b3f18fe90fa expref: 64 pid: 17700 timeout: 0 lvb_type: 0
      

      Attachments

        Issue Links

          Activity

            [LU-15846] sanity-quota interop fail: set_param: setting debug=+quota,trace: Invalid argument

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57100/
            Subject: LU-15846 tests: don't use comma-separated debug flags
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: ac48425b6575aa0a7c2be1b48cd86118f8c95155

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57100/ Subject: LU-15846 tests: don't use comma-separated debug flags Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: ac48425b6575aa0a7c2be1b48cd86118f8c95155

            "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57100
            Subject: LU-15846 tests: don't use comma-separated debug flags
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: fbfbd7558713772842f2e9f00983c5d0f69bd3c1

            gerrit Gerrit Updater added a comment - "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57100 Subject: LU-15846 tests: don't use comma-separated debug flags Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: fbfbd7558713772842f2e9f00983c5d0f69bd3c1
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47308/
            Subject: LU-15846 tests: don't use comma-separated debug flags
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: fe8315c25ed093d77a6f366e2a4849aba008b680

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47308/ Subject: LU-15846 tests: don't use comma-separated debug flags Project: fs/lustre-release Branch: master Current Patch Set: Commit: fe8315c25ed093d77a6f366e2a4849aba008b680

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47308
            Subject: LU-15846 tests: don't use comma-separated debug flags
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 12282314e2567973b1a7cb4d6806e987bf1a6755

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47308 Subject: LU-15846 tests: don't use comma-separated debug flags Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 12282314e2567973b1a7cb4d6806e987bf1a6755

            This may also be an interop/test-framework.sh bug when running on older servers (2.12.6.ddn66 in this case) introduced by patch https://review.whamcloud.com/43741 "LU-13055 libcfs: allow comma-separated masks" that was landed as 2.14.0-ddn37-36-gba4d37b9fc.

            The last lines of the test log are:

            trevis-39vm4: error: set_param: setting /sys/kernel/debug/lnet/debug=+quota,trace: Invalid argument
            trevis-39vm3: error: set_param: setting /sys/kernel/debug/lnet/debug=+quota,trace: Invalid argument
            

            which is caused by a change in sanity-quota.sh:

            quota_init() {
                    do_nodes $(comma_list $(nodes_list)) \
                            "$LCTL set_param -n debug=+quota,trace"
            }
            

            And indeed, running a Maloo search for b_es6_0 clients and b_es5_2 servers shows that sanity-quota.sh has been failing since 2022-03-10, which is the day after that patch landed.

            adilger Andreas Dilger added a comment - This may also be an interop/test-framework.sh bug when running on older servers (2.12.6.ddn66 in this case) introduced by patch https://review.whamcloud.com/43741 " LU-13055 libcfs: allow comma-separated masks " that was landed as 2.14.0-ddn37-36-gba4d37b9fc. The last lines of the test log are: trevis-39vm4: error: set_param: setting /sys/kernel/debug/lnet/debug=+quota,trace: Invalid argument trevis-39vm3: error: set_param: setting /sys/kernel/debug/lnet/debug=+quota,trace: Invalid argument which is caused by a change in sanity-quota.sh : quota_init() { do_nodes $(comma_list $(nodes_list)) \ "$LCTL set_param -n debug=+quota,trace" } And indeed, running a Maloo search for b_es6_0 clients and b_es5_2 servers shows that sanity-quota.sh has been failing since 2022-03-10, which is the day after that patch landed.

            People

              adilger Andreas Dilger
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: