Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11929

sanity-quota test 7d hangs with ‘(qmt_handler.c:421:qmt_dqacq0()) $$$ Release too much!’ on the MDS

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: Lustre 2.13.0, Lustre 2.10.7, Lustre 2.10.8
    • Fix Version/s: None
    • Labels:
    • Environment:
      ZFS
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      sanity-quota test_7d hangs for ZFS testing for 2.10.7; 2.10.6.13 and 2.10.6.35 only ... so far.

      Looking at the logs at https://testing.whamcloud.com/test_sessions/6d3f7b3a-f92d-4924-9192-f7eecd7d5b49 , the last lines of the client (vm9) test_log are

      CMD: trevis-40vm11 /usr/sbin/lctl get_param -n osd-zfs.lustre-OST0006.quota_slave.info |
      				grep user | awk '{ print \$3 }'
      CMD: trevis-40vm12 lctl set_param fail_val=0 fail_loc=0
      fail_val=0
      fail_loc=0
      running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
       [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d7d.sanity-quota/f7d.sanity-quota] [count=21] [oflag=sync]
      dd: error writing '/mnt/lustre/d7d.sanity-quota/f7d.sanity-quota': Disk quota exceeded
      20+0 records in
      19+0 records out
      19922944 bytes (20 MB) copied, 2033.43 s, 9.8 kB/s
      running as uid/gid/euid/egid 60001/60001/60001/60001, groups:
       [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d7d.sanity-quota/f7d.sanity-quota-1] [count=21] [oflag=sync]
      

      On the MDS (vm12), we see:

      [44392.869341] Lustre: DEBUG MARKER: == sanity-quota test 7d: Quota reintegration (Transfer index in multiple bulks) ====================== 23:08:17 (1549148897)
      [44393.070435] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      [44399.467416] Lustre: DEBUG MARKER: lctl set_param fail_val=0 fail_loc=0
      [44399.859790] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=none
      [44414.275402] Lustre: DEBUG MARKER: lctl set_param fail_val=0 fail_loc=0x608
      [44414.633037] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=u
      [44438.618804] Lustre: DEBUG MARKER: lctl set_param fail_val=0 fail_loc=0
      [44953.937332] LustreError: 27855:0:(qmt_handler.c:421:qmt_dqacq0()) $$$ Release too much! uuid:lustre-MDT0000-lwp-OST0000_UUID release:1048576 granted:0, total:0 qmt:lustre-QMT0000 pool:0-dt id:500 enforced:1 hard:14150438 soft:13476608 granted:0 time:0 qunit:1048576 edquot:0 may_rel:0 revoke:0
      [44953.941728] LustreError: 27855:0:(qmt_handler.c:421:qmt_dqacq0()) Skipped 18 previous similar messages
      [45553.974287] LustreError: 27856:0:(qmt_handler.c:421:qmt_dqacq0()) $$$ Release too much! uuid:lustre-MDT0000-lwp-OST0000_UUID release:1048576 granted:0, total:0 qmt:lustre-QMT0000 pool:0-dt id:500 enforced:1 hard:14150438 soft:13476608 granted:0 time:0 qunit:1048576 edquot:0 may_rel:0 revoke:0
      [45553.974290] LustreError: 27855:0:(qmt_handler.c:421:qmt_dqacq0()) $$$ Release too much! uuid:lustre-MDT0000-lwp-OST0000_UUID release:1048576 granted:0, total:0 qmt:lustre-QMT0000 pool:0-dt id:500 enforced:1 hard:14150438 soft:13476608 granted:0 time:0 qunit:1048576 edquot:0 may_rel:0 revoke:0
      [45553.974295] LustreError: 27855:0:(qmt_handler.c:421:qmt_dqacq0()) Skipped 17 previous similar messages
      [46154.360051] LustreError: 27856:0:(qmt_handler.c:421:qmt_dqacq0()) $$$ Release too much! uuid:lustre-MDT0000-lwp-OST0000_UUID release:1048576 granted:0, total:0 qmt:lustre-QMT0000 pool:0-dt id:500 enforced:1 hard:14150438 soft:13476608 granted:0 time:0 qunit:1048576 edquot:0 may_rel:0 revoke:0
      [46154.365743] LustreError: 27856:0:(qmt_handler.c:421:qmt_dqacq0()) Skipped 18 previous similar messages
      [46814.027256] LustreError: 26491:0:(qmt_handler.c:421:qmt_dqacq0()) $$$ Release too much! uuid:lustre-MDT0000-lwp-OST0000_UUID release:1048576 granted:0, total:0 qmt:lustre-QMT0000 pool:0-dt id:500 enforced:1 hard:14150438 soft:13476608 granted:0 time:0 qunit:1048576 edquot:0 may_rel:0 revoke:0
      [46814.032250] LustreError: 26491:0:(qmt_handler.c:421:qmt_dqacq0()) Skipped 21 previous similar messages
      [47414.195940] LustreError: 27855:0:(qmt_handler.c:421:qmt_dqacq0()) $$$ Release too much! uuid:lustre-MDT0000-lwp-OST0000_UUID release:1048576 granted:0, total:0 qmt:lustre-QMT0000 pool:0-dt id:500 enforced:1 hard:14150438 soft:13476608 granted:0 time:0 qunit:1048576 edquot:0 may_rel:0 revoke:0
      [47414.200389] LustreError: 27855:0:(qmt_handler.c:421:qmt_dqacq0()) Skipped 19 previous similar messages
      

      ON the OSS, we see

      [44434.303364] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osd-zfs.lustre-OST0006.quota_slave.info |
      [44434.303364] 				grep user | awk '{ print $3 }'
      [44434.662998] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osd-zfs.lustre-OST0006.quota_slave.info |
      [44434.662998] 				grep user | awk '{ print $3 }'
      [44530.264427] LustreError: 23139:0:(qsd_handler.c:340:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:lustre-OST0000 qtype:usr id:500 enforced:1 granted:1048576 pending:0 waiting:0 req:1 usage:0 qunit:0 qtune:0 edquot:0
      [44530.268212] LustreError: 23139:0:(qsd_handler.c:340:qsd_req_completion()) Skipped 19 previous similar messages
      [45130.433168] LustreError: 23139:0:(qsd_handler.c:340:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:lustre-OST0000 qtype:usr id:500 enforced:1 granted:1048576 pending:0 waiting:0 req:1 usage:0 qunit:0 qtune:0 edquot:0
      [45130.436534] LustreError: 23139:0:(qsd_handler.c:340:qsd_req_completion()) Skipped 19 previous similar messages
      [45790.255910] LustreError: 23139:0:(qsd_handler.c:340:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:lustre-OST0000 qtype:usr id:500 enforced:1 granted:1048576 pending:0 waiting:0 req:1 usage:0 qunit:0 qtune:0 edquot:0
      [45790.259658] LustreError: 23139:0:(qsd_handler.c:340:qsd_req_completion()) Skipped 21 previous similar messages
      [46450.453245] LustreError: 23139:0:(qsd_handler.c:340:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:lustre-OST0000 qtype:grp id:500 enforced:1 granted:1048576 pending:0 waiting:0 req:1 usage:0 qunit:0 qtune:0 edquot:0
      [46450.456931] LustreError: 23139:0:(qsd_handler.c:340:qsd_req_completion()) Skipped 21 previous similar messages
      [47110.650703] LustreError: 23139:0:(qsd_handler.c:340:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:lustre-OST0000 qtype:usr id:500 enforced:1 granted:1048576 pending:0 waiting:0 req:1 usage:0 qunit:0 qtune:0 edquot:0
      [47110.654670] LustreError: 23139:0:(qsd_handler.c:340:qsd_req_completion()) Skipped 21 previous similar messages
      [47770.563329] LustreError: 23140:0:(qsd_handler.c:340:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:lustre-OST0000 qtype:grp id:500 enforced:1 granted:1048576 pending:0 waiting:0 req:1 usage:0 qunit:0 qtune:0 edquot:0
      [47770.568287] LustreError: 23140:0:(qsd_handler.c:340:qsd_req_completion()) Skipped 21 previous similar messages
      

      We’ve only seen this test hang with these error messages twice; the other failure logs are at
      https://testing.whamcloud.com/test_sets/feb38b6a-1a93-11e9-8388-52540065bddc .

      Yet, we’ve seen sanity-quota test_7d hang many times in a DNE configuration with ZFS; these failures do not have the error messages above. We’ve seen the hangs with no error messages for 2.10.6, 2.10.7 and 2.11.56.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wc-triage WC Triage
                Reporter:
                jamesanunez James Nunez
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: