[LU-15846] sanity-quota interop fail: set_param: setting debug=+quota,trace: Invalid argument Created: 11/May/22 Updated: 07/Dec/23 Resolved: 04/Aug/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Andreas Dilger |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for Cliff White <cwhite@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/f26e6614-1b8b-4b79-a70f-ee4ce7252e9f Client appears to have been evicted at start or prior to start of test. [11019.259413] Lustre: DEBUG MARKER: == rpc test complete, duration -o sec ================================================================ 02:16:49 (1651976209) [11019.601614] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null [11020.000907] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null [11020.920108] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-39vm2.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 [11021.156356] Lustre: DEBUG MARKER: trevis-39vm2.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 [11021.251916] LustreError: 7705:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 0@lo) returned error from blocking AST (req@ffff971c467da880 x1732211603062336 status -107 rc -107), evict it ns: mdt-lustre-MDT0000_UUID lock: ffff971ccc36d680/0x1c652b3f19039ba8 lrc: 4/0,0 mode: PR/PR res: [0x20001a9e2:0x5a6a:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 0@lo remote: 0x1c652b3f19039b9a expref: 934 pid: 10604 timeout: 11121 lvb_type: 0 [11021.259068] LustreError: 138-a: lustre-MDT0000: A client on nid 0@lo was evicted due to a lock blocking callback time out: rc -107 [11021.261382] LustreError: 10594:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 0s: evicting client at 0@lo ns: mdt-lustre-MDT0000_UUID lock: ffff971ccc36d680/0x1c652b3f19039ba8 lrc: 3/0,0 mode: PR/PR res: [0x20001a9e2:0x5a6a:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 0@lo remote: 0x1c652b3f19039b9a expref: 935 pid: 10604 timeout: 0 lvb_type: 0 [11021.369209] LustreError: 7705:0:(client.c:1213:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff971c467da880 x1732211603062400/t0(0) o104->lustre-MDT0000@0@lo:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [11022.532353] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.mdt=ug [11022.923428] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=ug [11023.418437] Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: sanity-quota ============----- Sun May 8 02:16:53 UTC 2022 [11023.673072] Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-quota ============----- Sun May 8 02:16:53 UTC 2022 [11022.550571] LustreError: 17675:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.240.39.127@tcp) returned error from blocking AST (req@ffff8a9638ad8d80 x1732211273986240 status -107 rc -107), evict it ns: filter-lustre-OST0002_UUID lock: ffff8a95b6db2240/0x1ad8c47013a5028e lrc: 4/0,0 mode: PW/PW res: [0x161ba:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000400010020 nid: 10.240.39.127@tcp remote: 0x1c652b3f190530c1 expref: 63 pid: 12899 timeout: 11122 lvb_type: 0 [11022.558744] LustreError: 138-a: lustre-OST0002: A client on nid 10.240.39.127@tcp was evicted due to a lock blocking callback time out: rc -107 [11022.563250] LustreError: 20721:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 0s: evicting client at 10.240.39.127@tcp ns: filter-lustre-OST0003_UUID lock: ffff8a960f060b40/0x1ad8c47013a4c8f8 lrc: 3/0,0 mode: PW/PW res: [0x1615e:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000400030020 nid: 10.240.39.127@tcp remote: 0x1c652b3f18fe90fa expref: 64 pid: 17700 timeout: 0 lvb_type: 0 |
| Comments |
| Comment by Andreas Dilger [ 12/May/22 ] |
|
This may also be an interop/test-framework.sh bug when running on older servers (2.12.6.ddn66 in this case) introduced by patch https://review.whamcloud.com/43741 " The last lines of the test log are: trevis-39vm4: error: set_param: setting /sys/kernel/debug/lnet/debug=+quota,trace: Invalid argument trevis-39vm3: error: set_param: setting /sys/kernel/debug/lnet/debug=+quota,trace: Invalid argument which is caused by a change in sanity-quota.sh: quota_init() {
do_nodes $(comma_list $(nodes_list)) \
"$LCTL set_param -n debug=+quota,trace"
}
And indeed, running a Maloo search for b_es6_0 clients and b_es5_2 servers shows that sanity-quota.sh has been failing since 2022-03-10, which is the day after that patch landed. |
| Comment by Gerrit Updater [ 12/May/22 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47308 |
| Comment by Gerrit Updater [ 04/Aug/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47308/ |
| Comment by Peter Jones [ 04/Aug/22 ] |
|
Landed for 2.16 |