Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17191

sanity-quota test_1b, 1d, 1f, 1i: FAIL: user write success, but expect EDQUOT

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      Tests sanity-quota 1b, 1d, 1f, 1i regularly fail on my local VM on the latest master(d8d4df24c6924). Nothing specific should be done to reproduce it:

      uname -a
      Linux vm1 3.10.0-1160.49.1.el7_lustre.x86_64 #1 SMP Fri Jun 17 18:46:08 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
      ...
      bash ./llmount.sh
      ONLY=1 bash ./sanit-quota.sh
      ...
      == sanity-quota test complete, duration 287 sec ========== 02:28:36 (1697149716)
      sanity-quota: FAIL: test_1b user write success, but expect EDQUOT
      sanity-quota: FAIL: test_1d user write success, but expect EDQUOT
      sanity-quota: FAIL: test_1f user write success, but expect EDQUOT
      sanity-quota: FAIL: test_1i user write success, but expect EDQUOT
      === sanity-quota: start cleanup 02:28:36 (1697149716) === 

      At first look the problem comes from the client side - osc_quota_chkdq doesn't return EDQUOT despite the fact it got appropriate flag from the server:

      00000008:00000001:1.0:1697151056.504596:0:14647:0:(osc_request.c:2130:osc_brw_fini_request()) Process entered
      00000008:04000000:1.0:1697151056.504599:0:14647:0:(osc_request.c:2153:osc_brw_fini_request()) setdq for [1000 1000 0] with valid 0x18000006b584fb9, flags 6100
      00000001:00000001:1.0:1697151056.504604:0:14647:0:(osc_quota.c:92:osc_quota_setdq()) Process entered
      00000001:00000001:1.0:1697151056.504609:0:14647:0:(osc_quota.c:166:osc_quota_setdq()) Process leaving (rc=18446744073709551600 : -16 : fffffffffffffff0)
      00000008:00000001:1.0:1697151056.504614:0:14647:0:(osc_request.c:2185:osc_brw_fini_request()) Process leaving via out (rc=0 : 0 : 0x0) 
      00000008:00000001:1.0:1697151056.504618:0:14647:0:(osc_request.c:2399:osc_brw_fini_request()) Process leaving (rc=0 : 0 : 0) 
      ...
      00000001:00000001:3.0:1697151061.710836:0:2118:0:(osc_quota.c:40:osc_quota_chkdq()) Process entered
      00000001:00000001:3.0:1697151061.710837:0:2118:0:(osc_quota.c:55:osc_quota_chkdq()) Process leaving (rc=0 : 0 : 0)
       

      There is a -EBUSY error that from my point of view should be handled by another way:

      diff --git a/lustre/osc/osc_quota.c b/lustre/osc/osc_quota.c
      index b127361..f06276e 100644
      --- a/lustre/osc/osc_quota.c
      +++ b/lustre/osc/osc_quota.c
      @@ -129,6 +129,8 @@ int osc_quota_setdq(struct client_obd *cli, u64 xid, const unsigned int qid[],
                              bits |= BIT(type);
                              rc = xa_insert(&cli->cl_quota_exceeded_ids, qid[type],
                                             xa_mk_value(bits), GFP_KERNEL);
      +                       if (rc == -EBUSY)
      +                               continue;
                              if (rc)
                                      break; 

      However, above fix doesn't help in my case and tests continue to fail. I guess xa_insert should return 0 and this is the problem.

      I tried to revert "LU-8130 osc: convert osc_quota hash to xarray"(ac8c28f959d87c) and tests stopped failing.

      simmonsja , can you take a look? I'll push a revert for ac8c28f959d, but if you can prepare a quick fix I will abandon my revert and help you to move on with that.

      Attachments

        Issue Links

          Activity

            [LU-17191] sanity-quota test_1b, 1d, 1f, 1i: FAIL: user write success, but expect EDQUOT
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.16.0 [ 15190 ]
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52713/
            Subject: LU-17191 osc: only call xa_insert for new entries
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 67e0d9e40acc6adcebf89e2a4ac3860f0c4273d2

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52713/ Subject: LU-17191 osc: only call xa_insert for new entries Project: fs/lustre-release Branch: master Current Patch Set: Commit: 67e0d9e40acc6adcebf89e2a4ac3860f0c4273d2
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-17046 [ LU-17046 ]

            I found the source of the faliures. sanity-quota_1g is failing due to ("LU-13810 tests: increase limit for 1g"). 

            simmonsja James A Simmons added a comment - I found the source of the faliures. sanity-quota_1g is failing due to (" LU-13810 tests: increase limit for 1g"). 

            Hi simmonsja , if you say that sanity-quota_1g fails due to LU-17046 can you give any details?

            You probably looked into the logs or found why it fails. Ideally this should be the link to the  failure and a couple of words. It could save our time to finally fix LU-17046.

            Thanks.

            scherementsev Sergey Cheremencev added a comment - Hi simmonsja , if you say that sanity-quota_1g fails due to LU-17046 can you give any details? You probably looked into the logs or found why it fails. Ideally this should be the link to the  failure and a couple of words. It could save our time to finally fix LU-17046 . Thanks.

            The failure of sanity-quota 1g is LU-17046 which was reported before the Xarray patch landed.  So I wouldn't count out the Xarray work.

            simmonsja James A Simmons added a comment - The failure of sanity-quota 1g is LU-17046 which was reported before the Xarray patch landed.  So I wouldn't count out the Xarray work.

            Hmm, the most recent patch still fails Janitor for sanity-quota test_1g.

            Stephane had an interesting issue with project quota on the client (LU-16771) that would suggest being able to cache the (project) quota results on the client for a few seconds would be improve performance for applications that are statfs() intensive when project quotas are in use.

            I wonder if it makes sense to change the current xarray implementation for the quota to be able to cache at least the project quota information (usage/limit), but potentially also user/group quota, to avoid frequent RPCs.  The slight drawback is that the quota tests would probably need to disable this cache, but that could easily be done by setting "llite.*.statfs_max_age=0" or =1.

            It might make sense to change to an rhashtable at that point, since the Xarray implementation continues to have problems.  Alternately, we could store the project (+user+group?) quota as the Xarray value and store the "over quota" state as a mark on the Xarray entry?

            Thoughts?

            adilger Andreas Dilger added a comment - Hmm, the most recent patch still fails Janitor for sanity-quota test_1g. Stephane had an interesting issue with project quota on the client ( LU-16771 ) that would suggest being able to cache the (project) quota results on the client for a few seconds would be improve performance for applications that are statfs() intensive when project quotas are in use. I wonder if it makes sense to change the current xarray implementation for the quota to be able to cache at least the project quota information (usage/limit), but potentially also user/group quota, to avoid frequent RPCs.  The slight drawback is that the quota tests would probably need to disable this cache, but that could easily be done by setting " llite.*.statfs_max_age=0 " or =1. It might make sense to change to an rhashtable at that point, since the Xarray implementation continues to have problems.  Alternately, we could store the project (+user+group?) quota as the Xarray value and store the "over quota" state as a mark on the Xarray entry? Thoughts?
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-8130 [ LU-8130 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-16771 [ LU-16771 ]

            People

              simmonsja James A Simmons
              scherementsev Sergey Cheremencev
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: