Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17191

sanity-quota test_1b, 1d, 1f, 1i: FAIL: user write success, but expect EDQUOT

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      Tests sanity-quota 1b, 1d, 1f, 1i regularly fail on my local VM on the latest master(d8d4df24c6924). Nothing specific should be done to reproduce it:

      uname -a
      Linux vm1 3.10.0-1160.49.1.el7_lustre.x86_64 #1 SMP Fri Jun 17 18:46:08 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
      ...
      bash ./llmount.sh
      ONLY=1 bash ./sanit-quota.sh
      ...
      == sanity-quota test complete, duration 287 sec ========== 02:28:36 (1697149716)
      sanity-quota: FAIL: test_1b user write success, but expect EDQUOT
      sanity-quota: FAIL: test_1d user write success, but expect EDQUOT
      sanity-quota: FAIL: test_1f user write success, but expect EDQUOT
      sanity-quota: FAIL: test_1i user write success, but expect EDQUOT
      === sanity-quota: start cleanup 02:28:36 (1697149716) === 

      At first look the problem comes from the client side - osc_quota_chkdq doesn't return EDQUOT despite the fact it got appropriate flag from the server:

      00000008:00000001:1.0:1697151056.504596:0:14647:0:(osc_request.c:2130:osc_brw_fini_request()) Process entered
      00000008:04000000:1.0:1697151056.504599:0:14647:0:(osc_request.c:2153:osc_brw_fini_request()) setdq for [1000 1000 0] with valid 0x18000006b584fb9, flags 6100
      00000001:00000001:1.0:1697151056.504604:0:14647:0:(osc_quota.c:92:osc_quota_setdq()) Process entered
      00000001:00000001:1.0:1697151056.504609:0:14647:0:(osc_quota.c:166:osc_quota_setdq()) Process leaving (rc=18446744073709551600 : -16 : fffffffffffffff0)
      00000008:00000001:1.0:1697151056.504614:0:14647:0:(osc_request.c:2185:osc_brw_fini_request()) Process leaving via out (rc=0 : 0 : 0x0) 
      00000008:00000001:1.0:1697151056.504618:0:14647:0:(osc_request.c:2399:osc_brw_fini_request()) Process leaving (rc=0 : 0 : 0) 
      ...
      00000001:00000001:3.0:1697151061.710836:0:2118:0:(osc_quota.c:40:osc_quota_chkdq()) Process entered
      00000001:00000001:3.0:1697151061.710837:0:2118:0:(osc_quota.c:55:osc_quota_chkdq()) Process leaving (rc=0 : 0 : 0)
       

      There is a -EBUSY error that from my point of view should be handled by another way:

      diff --git a/lustre/osc/osc_quota.c b/lustre/osc/osc_quota.c
      index b127361..f06276e 100644
      --- a/lustre/osc/osc_quota.c
      +++ b/lustre/osc/osc_quota.c
      @@ -129,6 +129,8 @@ int osc_quota_setdq(struct client_obd *cli, u64 xid, const unsigned int qid[],
                              bits |= BIT(type);
                              rc = xa_insert(&cli->cl_quota_exceeded_ids, qid[type],
                                             xa_mk_value(bits), GFP_KERNEL);
      +                       if (rc == -EBUSY)
      +                               continue;
                              if (rc)
                                      break; 

      However, above fix doesn't help in my case and tests continue to fail. I guess xa_insert should return 0 and this is the problem.

      I tried to revert "LU-8130 osc: convert osc_quota hash to xarray"(ac8c28f959d87c) and tests stopped failing.

      simmonsja , can you take a look? I'll push a revert for ac8c28f959d, but if you can prepare a quick fix I will abandon my revert and help you to move on with that.

      Attachments

        Issue Links

          Activity

            People

              simmonsja James A Simmons
              scherementsev Sergey Cheremencev
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: