Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1438

quota_chk_acq_common() still haven't managed to acquire quota

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • Lustre 1.8.7
    • None
    • lustre-1.8.7-wc1, RHEL5.7 for servers, RHEL6.2 for clients
    • 3
    • 4584

    Description

      we are getting some of quota related problem. the quota feature is enabled on the filesystem and the customer changed group name to the many big files, then ran lfs quotacheck command.
      After that, even the group didn't exceed the quota limitation, they got the disk quota exceeded messages.

      OSS/MDS side, the following messages showed up since changed group name.
      (quota_interface.c:473:quota_chk_acq_common()) still haven't managed to
      acquire quota space from the quota master after 20 retries (err=0, rc=0)

      It seems to be close to LU-428.

      Attachments

        Issue Links

          Activity

            [LU-1438] quota_chk_acq_common() still haven't managed to acquire quota
            pjones Peter Jones added a comment -

            ok thanks Ihara

            pjones Peter Jones added a comment - ok thanks Ihara

            this might be caused by LU-1584 and LU-1720. Please close this ticket.
            I will open new ticket if we see same issue after apply LU-1438 and LU-1720 patches.

            ihara Shuichi Ihara (Inactive) added a comment - this might be caused by LU-1584 and LU-1720 . Please close this ticket. I will open new ticket if we see same issue after apply LU-1438 and LU-1720 patches.

            We didn't apply patches yet, but will apply them on the next scheduled maintenance time.

            ihara Shuichi Ihara (Inactive) added a comment - We didn't apply patches yet, but will apply them on the next scheduled maintenance time.

            probably, is there anything useful from the debug log?

            niu Niu Yawei (Inactive) added a comment - probably, is there anything useful from the debug log?

            This might be related to 32bit quota setting limitation?

            ihara Shuichi Ihara (Inactive) added a comment - This might be related to 32bit quota setting limitation?

            Hi Niu,

            Thanks for this debug patch. I will ask customer if we can apply patch.

            ihara Shuichi Ihara (Inactive) added a comment - Hi Niu, Thanks for this debug patch. I will ask customer if we can apply patch.

            Hi, Ihara

            Could you apply this debug patch? then we'll see lots more deubug information in the syslong along with the "still can't acquire..." messages. Thanks.

            niu Niu Yawei (Inactive) added a comment - Hi, Ihara Could you apply this debug patch? then we'll see lots more deubug information in the syslong along with the "still can't acquire..." messages. Thanks.

            Hi Niu,

            uploaded debug files on uploads/LU-1438/debugfile.20120628.gz
            However, it's not easy to reproduce this messages though the messages still showed up on the system log file irregularly.
            The lustre debug file doesn't contain messages. perhaps, the maximum debug size (100MB) exceeded quickly?
            Any ideas to keep track and debug information in this situation?

            ihara Shuichi Ihara (Inactive) added a comment - Hi Niu, uploaded debug files on uploads/ LU-1438 /debugfile.20120628.gz However, it's not easy to reproduce this messages though the messages still showed up on the system log file irregularly. The lustre debug file doesn't contain messages. perhaps, the maximum debug size (100MB) exceeded quickly? Any ideas to keep track and debug information in this situation?

            Hi, Ihara

            The debug patch would be similar to enable D_TRACE & D_QUOTA for debug log, If the cusotmer can't affort D_TRACE debug log, we can only enable D_QUOTA first to collect some debug log.

            The 28760 is pid, and the 0 is 'extern pid' (looks it's always 0 for now, you can just ignore it).

            niu Niu Yawei (Inactive) added a comment - Hi, Ihara The debug patch would be similar to enable D_TRACE & D_QUOTA for debug log, If the cusotmer can't affort D_TRACE debug log, we can only enable D_QUOTA first to collect some debug log. The 28760 is pid, and the 0 is 'extern pid' (looks it's always 0 for now, you can just ignore it).

            Niu, I meant you could make debug patch to see more detail information during the production system. e.g) Who exceed quota limit by this messages. surely, we didn't do any operations (set/clear quota) when the following messages showed up.

            Jun 25 03:03:59 nos141i kernel: Lustre: 28760:0:(quota_interface.c:481:quota_chk_acq_common()) still haven't managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)
            

            We don't know how reproduce this problem at this morment, but let me ask we can enable debug flags. BTW, what is "28760:0" of above messages?

            ihara Shuichi Ihara (Inactive) added a comment - Niu, I meant you could make debug patch to see more detail information during the production system. e.g) Who exceed quota limit by this messages. surely, we didn't do any operations (set/clear quota) when the following messages showed up. Jun 25 03:03:59 nos141i kernel: Lustre: 28760:0:(quota_interface.c:481:quota_chk_acq_common()) still haven't managed to acquire quota space from the quota master after 10 retries (err=0, rc=0) We don't know how reproduce this problem at this morment, but let me ask we can enable debug flags. BTW, what is "28760:0" of above messages?

            People

              niu Niu Yawei (Inactive)
              ihara Shuichi Ihara (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: