Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2289

still haven't managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)

Details

    • Bug
    • Resolution: Not a Bug
    • Minor
    • None
    • Lustre 1.8.8
    • Lustre 1.8.8 + LU-1720
    • 3
    • 5479

    Description

      Even after adding the patch from LU-1720, we are still seeing messages like:
      Lustre: 18271:0:(quota_interface.c:475:quota_chk_acq_common()) still haven't managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)

      At this site, we haven't added any other patches to 1.8.8. What do these messages mean? Is it possible that some of the other patches (like LU-1438) could fix these?

      The customer hasn't noticed any functional issues, but of course that doesn't mean there aren't any. Quotas >4TB work on this system.

      Attachments

        1. kern.log-mds
          1.90 MB
        2. kern.log-mds-Aug30
          97 kB
        3. kern.log-oss
          0.2 kB
        4. kern.log-oss-Aug30
          305 kB

        Activity

          [LU-2289] still haven't managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)

          Closing old bug.

          adilger Andreas Dilger added a comment - Closing old bug.

          Yes, when approaching limit, write performance will be lower because quota unit is very small when approaching limit, and there will be much more quota acquire/release operations between ost & mdt, even worse, the write operation has to be blocked on the quota acquire operation sometimes. We didn't measure the performance in such unusual case.

          niu Niu Yawei (Inactive) added a comment - Yes, when approaching limit, write performance will be lower because quota unit is very small when approaching limit, and there will be much more quota acquire/release operations between ost & mdt, even worse, the write operation has to be blocked on the quota acquire operation sometimes. We didn't measure the performance in such unusual case.

          Is it possible that this could cause performance issues when the UID/GID is approaching the quota limit? Do you know if there have ever been performance tests run for this case?

          Thanks.

          kitwestneat Kit Westneat (Inactive) added a comment - Is it possible that this could cause performance issues when the UID/GID is approaching the quota limit? Do you know if there have ever been performance tests run for this case? Thanks.

          Now I think it isn't a problem after I checked the large log for all processes (the small one I checked before is only for single process).

          When some uid/gid is approaching quota limit, the qunit size will be shrinked to the minimal size (1MB), and each quota acquire can only acquire 1MB limit, if there are many OST threads writing to the same uid/gid at this time, the acquired limit will be consumed very soon, so some writing threads have to recycle several times in the quota_chk_acq_common() until get required quota limit or fail with -EDQUOT at the end.

          niu Niu Yawei (Inactive) added a comment - Now I think it isn't a problem after I checked the large log for all processes (the small one I checked before is only for single process). When some uid/gid is approaching quota limit, the qunit size will be shrinked to the minimal size (1MB), and each quota acquire can only acquire 1MB limit, if there are many OST threads writing to the same uid/gid at this time, the acquired limit will be consumed very soon, so some writing threads have to recycle several times in the quota_chk_acq_common() until get required quota limit or fail with -EDQUOT at the end.

          Thank you, Kit. I checked the OSS log:

          00040000:00000400:6:1364224183.636169:0:1558:0:(quota_interface.c:475:quota_chk_acq_common()) still haven't managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)
          00040000:04000000:6:1364224183.636179:0:1558:0:(quota_interface.c:482:quota_chk_acq_common()) recheck quota with rc: 0, cycle: 10
          00040000:04000000:6:1364224183.636181:0:1558:0:(quota_adjust_qunit.c:172:quota_search_lqs()) lqs(ffff810846b6a280) id(19302) flag(2147483649) type(g) bunit(1048576) btune(524288) iunit(5120) itune(2560) lqs_bwrite_pending(6340608) lqs_iwrite_pending(0) ino_rec(0) blk_rec(2097152)refcount(11): search lqs
          00040000:04000000:6:1364224183.636190:0:1558:0:(quota_interface.c:345:quota_check_common()) count=256 lqs_pending=6340608 qd_count=5087232 isblk=2 mb=0 pending[1]=1056768
          00040000:04000000:6:1364224183.636193:0:1558:0:(quota_adjust_qunit.c:172:quota_search_lqs()) lqs(ffff810846b6a280) id(19302) flag(2147483649) type(g) bunit(1048576) btune(524288) iunit(5120) itune(2560) lqs_bwrite_pending(6340608) lqs_iwrite_pending(0) ino_rec(0) blk_rec(2097152)refcount(11): search lqs
          00002000:04000000:6:1364224183.636199:0:1558:0:(fsfilt-ldiskfs.c:1601:fsfilt_ldiskfs_quotactl()) quotactl command 0x800007, id 10311, type 0 failed: -3
          00040000:04000000:6:1364224183.636205:0:1558:0:(quota_adjust_qunit.c:172:quota_search_lqs()) lqs(ffff810846b6a280) id(19302) flag(2147483649) type(g) bunit(1048576) btune(524288) iunit(5120) itune(2560) lqs_bwrite_pending(6340608) lqs_iwrite_pending(0) ino_rec(0) blk_rec(2097152)refcount(11): search lqs
          00040000:04000000:6:1364224183.636209:0:1558:0:(quota_context.c:347:check_cur_qunit()) type: b, limit: 50417631232, usage: 50416787456, pending_write: 6340608, record: 2097152, qunit_sz: 1048576, tune_sz: 524288, ret: 0.
          00040000:04000000:6:1364224183.636214:0:1558:0:(quota_interface.c:482:quota_chk_acq_common()) recheck quota with rc: 0, cycle: 11
          00040000:04000000:6:1364224183.636217:0:1558:0:(quota_adjust_qunit.c:172:quota_search_lqs()) lqs(ffff810846b6a280) id(19302) flag(2147483649) type(g) bunit(1048576) btune(524288) iunit(5120) itune(2560) lqs_bwrite_pending(6340608) lqs_iwrite_pending(0) ino_rec(0) blk_rec(2097152)refcount(11): search lqs
          00040000:04000000:6:1364224183.636224:0:1558:0:(quota_interface.c:345:quota_check_common()) count=256 lqs_pending=6340608 qd_count=5087232 isblk=2 mb=0 pending[1]=1056768
          00040000:04000000:6:1364224183.636227:0:1558:0:(quota_adjust_qunit.c:172:quota_search_lqs()) lqs(ffff810846b6a280) id(19302) flag(2147483649) type(g) bunit(1048576) btune(524288) iunit(5120) itune(2560) lqs_bwrite_pending(6340608) lqs_iwrite_pending(0) ino_rec(0) blk_rec(2097152)refcount(11): search lqs
          00002000:04000000:6:1364224183.636231:0:1558:0:(fsfilt-ldiskfs.c:1601:fsfilt_ldiskfs_quotactl()) quotactl command 0x800007, id 10311, type 0 failed: -3
          00040000:04000000:6:1364224183.636239:0:1558:0:(quota_adjust_qunit.c:172:quota_search_lqs()) lqs(ffff810846b6a280) id(19302) flag(2147483649) type(g) bunit(1048576) btune(524288) iunit(5120) itune(2560) lqs_bwrite_pending(6340608) lqs_iwrite_pending(0) ino_rec(0) blk_rec(2097152)refcount(11): search lqs
          00040000:04000000:6:1364224183.636243:0:1558:0:(quota_context.c:347:check_cur_qunit()) type: b, limit: 50417631232, usage: 50416787456, pending_write: 6340608, record: 2097152, qunit_sz: 1048576, tune_sz: 524288, ret: 0.
          

          Looks there is pending acquire request (record: 2097152), and local limit will be able to satisfy current write request when the pending acquire done, so OST should wait for the pending reqeust, but it actually didn't wait, so quota_chk_acq_common() recycled many times until the pending request done.

          I didn't see why OST didn't wait on pending acquire so far, maybe there is something wrong with the lqs_blk_rec calculation? I'll look into it further.

          niu Niu Yawei (Inactive) added a comment - Thank you, Kit. I checked the OSS log: 00040000:00000400:6:1364224183.636169:0:1558:0:(quota_interface.c:475:quota_chk_acq_common()) still haven't managed to acquire quota space from the quota master after 10 retries (err=0, rc=0) 00040000:04000000:6:1364224183.636179:0:1558:0:(quota_interface.c:482:quota_chk_acq_common()) recheck quota with rc: 0, cycle: 10 00040000:04000000:6:1364224183.636181:0:1558:0:(quota_adjust_qunit.c:172:quota_search_lqs()) lqs(ffff810846b6a280) id(19302) flag(2147483649) type(g) bunit(1048576) btune(524288) iunit(5120) itune(2560) lqs_bwrite_pending(6340608) lqs_iwrite_pending(0) ino_rec(0) blk_rec(2097152)refcount(11): search lqs 00040000:04000000:6:1364224183.636190:0:1558:0:(quota_interface.c:345:quota_check_common()) count=256 lqs_pending=6340608 qd_count=5087232 isblk=2 mb=0 pending[1]=1056768 00040000:04000000:6:1364224183.636193:0:1558:0:(quota_adjust_qunit.c:172:quota_search_lqs()) lqs(ffff810846b6a280) id(19302) flag(2147483649) type(g) bunit(1048576) btune(524288) iunit(5120) itune(2560) lqs_bwrite_pending(6340608) lqs_iwrite_pending(0) ino_rec(0) blk_rec(2097152)refcount(11): search lqs 00002000:04000000:6:1364224183.636199:0:1558:0:(fsfilt-ldiskfs.c:1601:fsfilt_ldiskfs_quotactl()) quotactl command 0x800007, id 10311, type 0 failed: -3 00040000:04000000:6:1364224183.636205:0:1558:0:(quota_adjust_qunit.c:172:quota_search_lqs()) lqs(ffff810846b6a280) id(19302) flag(2147483649) type(g) bunit(1048576) btune(524288) iunit(5120) itune(2560) lqs_bwrite_pending(6340608) lqs_iwrite_pending(0) ino_rec(0) blk_rec(2097152)refcount(11): search lqs 00040000:04000000:6:1364224183.636209:0:1558:0:(quota_context.c:347:check_cur_qunit()) type: b, limit: 50417631232, usage: 50416787456, pending_write: 6340608, record: 2097152, qunit_sz: 1048576, tune_sz: 524288, ret: 0. 00040000:04000000:6:1364224183.636214:0:1558:0:(quota_interface.c:482:quota_chk_acq_common()) recheck quota with rc: 0, cycle: 11 00040000:04000000:6:1364224183.636217:0:1558:0:(quota_adjust_qunit.c:172:quota_search_lqs()) lqs(ffff810846b6a280) id(19302) flag(2147483649) type(g) bunit(1048576) btune(524288) iunit(5120) itune(2560) lqs_bwrite_pending(6340608) lqs_iwrite_pending(0) ino_rec(0) blk_rec(2097152)refcount(11): search lqs 00040000:04000000:6:1364224183.636224:0:1558:0:(quota_interface.c:345:quota_check_common()) count=256 lqs_pending=6340608 qd_count=5087232 isblk=2 mb=0 pending[1]=1056768 00040000:04000000:6:1364224183.636227:0:1558:0:(quota_adjust_qunit.c:172:quota_search_lqs()) lqs(ffff810846b6a280) id(19302) flag(2147483649) type(g) bunit(1048576) btune(524288) iunit(5120) itune(2560) lqs_bwrite_pending(6340608) lqs_iwrite_pending(0) ino_rec(0) blk_rec(2097152)refcount(11): search lqs 00002000:04000000:6:1364224183.636231:0:1558:0:(fsfilt-ldiskfs.c:1601:fsfilt_ldiskfs_quotactl()) quotactl command 0x800007, id 10311, type 0 failed: -3 00040000:04000000:6:1364224183.636239:0:1558:0:(quota_adjust_qunit.c:172:quota_search_lqs()) lqs(ffff810846b6a280) id(19302) flag(2147483649) type(g) bunit(1048576) btune(524288) iunit(5120) itune(2560) lqs_bwrite_pending(6340608) lqs_iwrite_pending(0) ino_rec(0) blk_rec(2097152)refcount(11): search lqs 00040000:04000000:6:1364224183.636243:0:1558:0:(quota_context.c:347:check_cur_qunit()) type: b, limit: 50417631232, usage: 50416787456, pending_write: 6340608, record: 2097152, qunit_sz: 1048576, tune_sz: 524288, ret: 0. Looks there is pending acquire request (record: 2097152), and local limit will be able to satisfy current write request when the pending acquire done, so OST should wait for the pending reqeust, but it actually didn't wait, so quota_chk_acq_common() recycled many times until the pending request done. I didn't see why OST didn't wait on pending acquire so far, maybe there is something wrong with the lqs_blk_rec calculation? I'll look into it further.

          Hi Niu,

          I've uploaded the logs here:
          http://eu.ddn.com:8080/lustre/LU-2289.tar.bz2

          It's about a 3 second window on each server, but it's still about 20GB uncompressed. It looks like for the message at 1364224183, it tries 19 times before it finally acquires it. Is there anything in the MDT logs that would indicate why it took so long? I didn't see anything in particular, but it's tough to read.

          kitwestneat Kit Westneat (Inactive) added a comment - Hi Niu, I've uploaded the logs here: http://eu.ddn.com:8080/lustre/LU-2289.tar.bz2 It's about a 3 second window on each server, but it's still about 20GB uncompressed. It looks like for the message at 1364224183, it tries 19 times before it finally acquires it. Is there anything in the MDT logs that would indicate why it took so long? I didn't see anything in particular, but it's tough to read.

          Hi, Kit

          If you have debug log, you can see if the quota is acquired eventually by checking the pid. (find the pid in the message of "still haven't managed ...", and check if this thread is constantly printing such message or it moves on eventually).

          When you attach the log, I suggest you truncate out the useful part, otherwise, it could be too large. The useful part should be the messages nearby "still haven't managed ..." OST and the corresponding messages on MDS.

          I didn't find anything abnormal in the output of "lfs quota -u -v", thank you for providing this.

          niu Niu Yawei (Inactive) added a comment - Hi, Kit If you have debug log, you can see if the quota is acquired eventually by checking the pid. (find the pid in the message of "still haven't managed ...", and check if this thread is constantly printing such message or it moves on eventually). When you attach the log, I suggest you truncate out the useful part, otherwise, it could be too large. The useful part should be the messages nearby "still haven't managed ..." OST and the corresponding messages on MDS. I didn't find anything abnormal in the output of "lfs quota -u -v", thank you for providing this.

          Here are the quotas with -v:

          [root@Jet:fe6 ~]# lfs quota -v -g hfipprd /lfs1
          Disk quotas for group hfipprd (gid 10074):
          Filesystem kbytes quota limit grace files quota limit
          grace
          /lfs1 0 31457280 33030144 - 0 0
          0 -
          lfs1-MDT0000_UUID
          0 - 65536 - 0 - 0
          -
          lfs1-OST0000_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0001_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0002_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0003_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0004_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0005_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0006_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0007_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0008_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0009_UUID
          0 - 65536 - - - -
          -
          lfs1-OST000a_UUID
          0 - 65536 - - - -
          -
          lfs1-OST000b_UUID
          0 - 65536 - - - -
          -
          lfs1-OST000c_UUID
          0 - 65536 - - - -
          -
          lfs1-OST000d_UUID
          0 - 65536 - - - -
          -
          lfs1-OST000e_UUID
          0 - 65536 - - - -
          -
          lfs1-OST000f_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0010_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0011_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0012_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0013_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0014_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0015_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0016_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0017_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0018_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0019_UUID
          0 - 65536 - - - -
          -
          lfs1-OST001a_UUID
          0 - 65536 - - - -
          -
          lfs1-OST001b_UUID
          0 - 65536 - - - -
          -
          lfs1-OST001c_UUID
          0 - 65536 - - - -
          -
          lfs1-OST001d_UUID
          0 - 65536 - - - -
          -
          lfs1-OST001e_UUID
          0 - 65536 - - - -
          -
          lfs1-OST001f_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0020_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0021_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0022_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0023_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0024_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0025_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0026_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0027_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0028_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0029_UUID
          0 - 65536 - - - -
          -
          lfs1-OST002a_UUID
          0 - 65536 - - - -
          -
          lfs1-OST002b_UUID
          0 - 65536 - - - -
          -
          lfs1-OST002c_UUID
          0 - 65536 - - - -
          -
          lfs1-OST002d_UUID
          0 - 65536 - - - -
          -
          lfs1-OST002e_UUID
          0 - 65536 - - - -
          -
          lfs1-OST002f_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0030_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0031_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0032_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0033_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0034_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0035_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0036_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0037_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0038_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0039_UUID
          0 - 65536 - - - -
          -
          lfs1-OST003a_UUID
          0 - 65536 - - - -
          -
          lfs1-OST003b_UUID
          0 - 65536 - - - -
          -
          lfs1-OST003c_UUID
          0 - 65536 - - - -
          -
          lfs1-OST003d_UUID
          0 - 65536 - - - -
          -
          lfs1-OST003e_UUID
          0 - 65536 - - - -
          -
          lfs1-OST003f_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0040_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0041_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0042_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0043_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0044_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0045_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0046_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0047_UUID
          0 - 65536 - - - -
          -

          [root@Jet:fe6 ~]# lfs quota -v -g hfip-gfdl /lfs1
          Disk quotas for group hfip-gfdl (gid 19103):
          Filesystem kbytes quota limit grace files quota limit
          grace
          /lfs1 19179828108 20971520000 22020096000 - 3567465
          0 0 -
          lfs1-MDT0000_UUID
          1408244 - 1572864 - 3567465 - 0
          -
          lfs1-OST0000_UUID
          266307944 - 266469376 - - -

          • -
            lfs1-OST0001_UUID
            264490536 - 264634368 - - -
          • -
            lfs1-OST0002_UUID
            252475224 - 252575744 - - -
          • -
            lfs1-OST0003_UUID
            267990876 - 268173312 - - -
          • -
            lfs1-OST0004_UUID
            246162144 - 246284288 - - -
          • -
            lfs1-OST0005_UUID
            267940220 - 268042240 - - -
          • -
            lfs1-OST0006_UUID
            290532840 - 290717696 - - -
          • -
            lfs1-OST0007_UUID
            283466396 - 283639808 - - -
          • -
            lfs1-OST0008_UUID
            272318500 - 272498688 - - -
          • -
            lfs1-OST0009_UUID
            250366252 - 250478592 - - -
          • -
            lfs1-OST000a_UUID
            268765900 - 268959744 - - -
          • -
            lfs1-OST000b_UUID
            241840752 - 241958912 - - -
          • -
            lfs1-OST000c_UUID
            279475784 - 279576576 - - -
          • -
            lfs1-OST000d_UUID
            295138616 - 295305216 - - -
          • -
            lfs1-OST000e_UUID
            245769816 - 245891072 - - -
          • -
            lfs1-OST000f_UUID
            258449640 - 258605056 - - -
          • -
            lfs1-OST0010_UUID
            228665128 - 228851712 - - -
          • -
            lfs1-OST0011_UUID
            278386580 - 278528000 - - -
          • -
            lfs1-OST0012_UUID
            277253652 - 277348352 - - -
          • -
            lfs1-OST0013_UUID
            185330656 - 185466880 - - -
          • -
            lfs1-OST0014_UUID
            281571296 - 281673728 - - -
          • -
            lfs1-OST0015_UUID
            246132636 - 246284288 - - -
          • -
            lfs1-OST0016_UUID
            274774304 - 274857984 - - -
          • -
            lfs1-OST0017_UUID
            279671348 - 279838720 - - -
          • -
            lfs1-OST0018_UUID
            278511144 - 278659072 - - -
          • -
            lfs1-OST0019_UUID
            279721016 - 279838720 - - -
          • -
            lfs1-OST001a_UUID
            286078396 - 286261248 - - -
          • -
            lfs1-OST001b_UUID
            284810104 - 284950528 - - -
          • -
            lfs1-OST001c_UUID
            266730624 - 266862592 - - -
          • -
            lfs1-OST001d_UUID
            267699528 - 267780096 - - -
          • -
            lfs1-OST001e_UUID
            271926964 - 272105472 - - -
          • -
            lfs1-OST001f_UUID
            279551644 - 279707648 - - -
          • -
            lfs1-OST0020_UUID
            262360628 - 262537216 - - -
          • -
            lfs1-OST0021_UUID
            286415060 - 286523392 - - -
          • -
            lfs1-OST0022_UUID
            227815636 - 227934208 - - -
          • -
            lfs1-OST0023_UUID
            271352884 - 271450112 - - -
          • -
            lfs1-OST0024_UUID
            272460280 - 272629760 - - -
          • -
            lfs1-OST0025_UUID
            198046512 - 198180864 - - -
          • -
            lfs1-OST0026_UUID
            289555288 - 289669120 - - -
          • -
            lfs1-OST0027_UUID
            258436324 - 258605056 - - -
          • -
            lfs1-OST0028_UUID
            270941900 - 271056896 - - -
          • -
            lfs1-OST0029_UUID
            251377320 - 251527168 - - -
          • -
            lfs1-OST002a_UUID
            282350136 - 282460160 - - -
          • -
            lfs1-OST002b_UUID
            280608276 - 280756224 - - -
          • -
            lfs1-OST002c_UUID
            284707076 - 284819456 - - -
          • -
            lfs1-OST002d_UUID
            283481900 - 283639808 - - -
          • -
            lfs1-OST002e_UUID
            283875928 - 284033024 - - -
          • -
            lfs1-OST002f_UUID
            241472472 - 241565696 - - -
          • -
            lfs1-OST0030_UUID
            265680488 - 265814016 - - -
          • -
            lfs1-OST0031_UUID
            273082788 - 273154048 - - -
          • -
            lfs1-OST0032_UUID
            227469276 - 227540992 - - -
          • -
            lfs1-OST0033_UUID
            301283952 - 301465600 - - -
          • -
            lfs1-OST0034_UUID
            282200664 - 282329088 - - -
          • -
            lfs1-OST0035_UUID
            236723324 - 236847104 - - -
          • -
            lfs1-OST0036_UUID
            244762536 - 244842496 - - -
          • -
            lfs1-OST0037_UUID
            290396060 - 290586624 - - -
          • -
            lfs1-OST0038_UUID
            247838916 - 247988224 - - -
          • -
            lfs1-OST0039_UUID
            289519172 - 289669120 - - -
          • -
            lfs1-OST003a_UUID
            237240432 - 237371392 - - -
          • -
            lfs1-OST003b_UUID
            281243232 - 281411584 - - -
          • -
            lfs1-OST003c_UUID
            255128092 - 255197184 - - -
          • -
            lfs1-OST003d_UUID
            267801692 - 267911168 - - -
          • -
            lfs1-OST003e_UUID
            268842936 - 268959744 - - -
          • -
            lfs1-OST003f_UUID
            259531064 - 259653632 - - -
          • -
            lfs1-OST0040_UUID
            270812788 - 270925824 - - -
          • -
            lfs1-OST0041_UUID
            256516360 - 256638976 - - -
          • -
            lfs1-OST0042_UUID
            289071672 - 289144832 - - -
          • -
            lfs1-OST0043_UUID
            290809280 - 290979840 - - -
          • -
            lfs1-OST0044_UUID
            265673448 - 265814016 - - -
          • -
            lfs1-OST0045_UUID
            288587388 - 288751616 - - -
          • -
            lfs1-OST0046_UUID
            291939312 - 292028416 - - -
          • -
            lfs1-OST0047_UUID
            232700912 - 232783872 - - -
          • -

          [root@Jet:fe6 ~]# lfs quota -v -g rthfip-gfdl /lfs1
          Disk quotas for group rthfip-gfdl (gid 19104):
          Filesystem kbytes quota limit grace files quota limit
          grace
          /lfs1 0 31457280 33030144 - 0 0
          0 -
          lfs1-MDT0000_UUID
          0 - 65536 - 0 - 0
          -
          lfs1-OST0000_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0001_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0002_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0003_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0004_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0005_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0006_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0007_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0008_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0009_UUID
          0 - 65536 - - - -
          -
          lfs1-OST000a_UUID
          0 - 65536 - - - -
          -
          lfs1-OST000b_UUID
          0 - 65536 - - - -
          -
          lfs1-OST000c_UUID
          0 - 65536 - - - -
          -
          lfs1-OST000d_UUID
          0 - 65536 - - - -
          -
          lfs1-OST000e_UUID
          0 - 65536 - - - -
          -
          lfs1-OST000f_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0010_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0011_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0012_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0013_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0014_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0015_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0016_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0017_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0018_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0019_UUID
          0 - 65536 - - - -
          -
          lfs1-OST001a_UUID
          0 - 65536 - - - -
          -
          lfs1-OST001b_UUID
          0 - 65536 - - - -
          -
          lfs1-OST001c_UUID
          0 - 65536 - - - -
          -
          lfs1-OST001d_UUID
          0 - 65536 - - - -
          -
          lfs1-OST001e_UUID
          0 - 65536 - - - -
          -
          lfs1-OST001f_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0020_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0021_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0022_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0023_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0024_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0025_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0026_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0027_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0028_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0029_UUID
          0 - 65536 - - - -
          -
          lfs1-OST002a_UUID
          0 - 65536 - - - -
          -
          lfs1-OST002b_UUID
          0 - 65536 - - - -
          -
          lfs1-OST002c_UUID
          0 - 65536 - - - -
          -
          lfs1-OST002d_UUID
          0 - 65536 - - - -
          -
          lfs1-OST002e_UUID
          0 - 65536 - - - -
          -
          lfs1-OST002f_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0030_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0031_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0032_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0033_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0034_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0035_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0036_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0037_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0038_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0039_UUID
          0 - 65536 - - - -
          -
          lfs1-OST003a_UUID
          0 - 65536 - - - -
          -
          lfs1-OST003b_UUID
          0 - 65536 - - - -
          -
          lfs1-OST003c_UUID
          0 - 65536 - - - -
          -
          lfs1-OST003d_UUID
          0 - 65536 - - - -
          -
          lfs1-OST003e_UUID
          0 - 65536 - - - -
          -
          lfs1-OST003f_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0040_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0041_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0042_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0043_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0044_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0045_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0046_UUID
          0 - 65536 - - - -
          -
          lfs1-OST0047_UUID
          0 - 65536 - - - -
          -

          kitwestneat Kit Westneat (Inactive) added a comment - Here are the quotas with -v: [root@Jet:fe6 ~] # lfs quota -v -g hfipprd /lfs1 Disk quotas for group hfipprd (gid 10074): Filesystem kbytes quota limit grace files quota limit grace /lfs1 0 31457280 33030144 - 0 0 0 - lfs1-MDT0000_UUID 0 - 65536 - 0 - 0 - lfs1-OST0000_UUID 0 - 65536 - - - - - lfs1-OST0001_UUID 0 - 65536 - - - - - lfs1-OST0002_UUID 0 - 65536 - - - - - lfs1-OST0003_UUID 0 - 65536 - - - - - lfs1-OST0004_UUID 0 - 65536 - - - - - lfs1-OST0005_UUID 0 - 65536 - - - - - lfs1-OST0006_UUID 0 - 65536 - - - - - lfs1-OST0007_UUID 0 - 65536 - - - - - lfs1-OST0008_UUID 0 - 65536 - - - - - lfs1-OST0009_UUID 0 - 65536 - - - - - lfs1-OST000a_UUID 0 - 65536 - - - - - lfs1-OST000b_UUID 0 - 65536 - - - - - lfs1-OST000c_UUID 0 - 65536 - - - - - lfs1-OST000d_UUID 0 - 65536 - - - - - lfs1-OST000e_UUID 0 - 65536 - - - - - lfs1-OST000f_UUID 0 - 65536 - - - - - lfs1-OST0010_UUID 0 - 65536 - - - - - lfs1-OST0011_UUID 0 - 65536 - - - - - lfs1-OST0012_UUID 0 - 65536 - - - - - lfs1-OST0013_UUID 0 - 65536 - - - - - lfs1-OST0014_UUID 0 - 65536 - - - - - lfs1-OST0015_UUID 0 - 65536 - - - - - lfs1-OST0016_UUID 0 - 65536 - - - - - lfs1-OST0017_UUID 0 - 65536 - - - - - lfs1-OST0018_UUID 0 - 65536 - - - - - lfs1-OST0019_UUID 0 - 65536 - - - - - lfs1-OST001a_UUID 0 - 65536 - - - - - lfs1-OST001b_UUID 0 - 65536 - - - - - lfs1-OST001c_UUID 0 - 65536 - - - - - lfs1-OST001d_UUID 0 - 65536 - - - - - lfs1-OST001e_UUID 0 - 65536 - - - - - lfs1-OST001f_UUID 0 - 65536 - - - - - lfs1-OST0020_UUID 0 - 65536 - - - - - lfs1-OST0021_UUID 0 - 65536 - - - - - lfs1-OST0022_UUID 0 - 65536 - - - - - lfs1-OST0023_UUID 0 - 65536 - - - - - lfs1-OST0024_UUID 0 - 65536 - - - - - lfs1-OST0025_UUID 0 - 65536 - - - - - lfs1-OST0026_UUID 0 - 65536 - - - - - lfs1-OST0027_UUID 0 - 65536 - - - - - lfs1-OST0028_UUID 0 - 65536 - - - - - lfs1-OST0029_UUID 0 - 65536 - - - - - lfs1-OST002a_UUID 0 - 65536 - - - - - lfs1-OST002b_UUID 0 - 65536 - - - - - lfs1-OST002c_UUID 0 - 65536 - - - - - lfs1-OST002d_UUID 0 - 65536 - - - - - lfs1-OST002e_UUID 0 - 65536 - - - - - lfs1-OST002f_UUID 0 - 65536 - - - - - lfs1-OST0030_UUID 0 - 65536 - - - - - lfs1-OST0031_UUID 0 - 65536 - - - - - lfs1-OST0032_UUID 0 - 65536 - - - - - lfs1-OST0033_UUID 0 - 65536 - - - - - lfs1-OST0034_UUID 0 - 65536 - - - - - lfs1-OST0035_UUID 0 - 65536 - - - - - lfs1-OST0036_UUID 0 - 65536 - - - - - lfs1-OST0037_UUID 0 - 65536 - - - - - lfs1-OST0038_UUID 0 - 65536 - - - - - lfs1-OST0039_UUID 0 - 65536 - - - - - lfs1-OST003a_UUID 0 - 65536 - - - - - lfs1-OST003b_UUID 0 - 65536 - - - - - lfs1-OST003c_UUID 0 - 65536 - - - - - lfs1-OST003d_UUID 0 - 65536 - - - - - lfs1-OST003e_UUID 0 - 65536 - - - - - lfs1-OST003f_UUID 0 - 65536 - - - - - lfs1-OST0040_UUID 0 - 65536 - - - - - lfs1-OST0041_UUID 0 - 65536 - - - - - lfs1-OST0042_UUID 0 - 65536 - - - - - lfs1-OST0043_UUID 0 - 65536 - - - - - lfs1-OST0044_UUID 0 - 65536 - - - - - lfs1-OST0045_UUID 0 - 65536 - - - - - lfs1-OST0046_UUID 0 - 65536 - - - - - lfs1-OST0047_UUID 0 - 65536 - - - - - [root@Jet:fe6 ~] # lfs quota -v -g hfip-gfdl /lfs1 Disk quotas for group hfip-gfdl (gid 19103): Filesystem kbytes quota limit grace files quota limit grace /lfs1 19179828108 20971520000 22020096000 - 3567465 0 0 - lfs1-MDT0000_UUID 1408244 - 1572864 - 3567465 - 0 - lfs1-OST0000_UUID 266307944 - 266469376 - - - - lfs1-OST0001_UUID 264490536 - 264634368 - - - - lfs1-OST0002_UUID 252475224 - 252575744 - - - - lfs1-OST0003_UUID 267990876 - 268173312 - - - - lfs1-OST0004_UUID 246162144 - 246284288 - - - - lfs1-OST0005_UUID 267940220 - 268042240 - - - - lfs1-OST0006_UUID 290532840 - 290717696 - - - - lfs1-OST0007_UUID 283466396 - 283639808 - - - - lfs1-OST0008_UUID 272318500 - 272498688 - - - - lfs1-OST0009_UUID 250366252 - 250478592 - - - - lfs1-OST000a_UUID 268765900 - 268959744 - - - - lfs1-OST000b_UUID 241840752 - 241958912 - - - - lfs1-OST000c_UUID 279475784 - 279576576 - - - - lfs1-OST000d_UUID 295138616 - 295305216 - - - - lfs1-OST000e_UUID 245769816 - 245891072 - - - - lfs1-OST000f_UUID 258449640 - 258605056 - - - - lfs1-OST0010_UUID 228665128 - 228851712 - - - - lfs1-OST0011_UUID 278386580 - 278528000 - - - - lfs1-OST0012_UUID 277253652 - 277348352 - - - - lfs1-OST0013_UUID 185330656 - 185466880 - - - - lfs1-OST0014_UUID 281571296 - 281673728 - - - - lfs1-OST0015_UUID 246132636 - 246284288 - - - - lfs1-OST0016_UUID 274774304 - 274857984 - - - - lfs1-OST0017_UUID 279671348 - 279838720 - - - - lfs1-OST0018_UUID 278511144 - 278659072 - - - - lfs1-OST0019_UUID 279721016 - 279838720 - - - - lfs1-OST001a_UUID 286078396 - 286261248 - - - - lfs1-OST001b_UUID 284810104 - 284950528 - - - - lfs1-OST001c_UUID 266730624 - 266862592 - - - - lfs1-OST001d_UUID 267699528 - 267780096 - - - - lfs1-OST001e_UUID 271926964 - 272105472 - - - - lfs1-OST001f_UUID 279551644 - 279707648 - - - - lfs1-OST0020_UUID 262360628 - 262537216 - - - - lfs1-OST0021_UUID 286415060 - 286523392 - - - - lfs1-OST0022_UUID 227815636 - 227934208 - - - - lfs1-OST0023_UUID 271352884 - 271450112 - - - - lfs1-OST0024_UUID 272460280 - 272629760 - - - - lfs1-OST0025_UUID 198046512 - 198180864 - - - - lfs1-OST0026_UUID 289555288 - 289669120 - - - - lfs1-OST0027_UUID 258436324 - 258605056 - - - - lfs1-OST0028_UUID 270941900 - 271056896 - - - - lfs1-OST0029_UUID 251377320 - 251527168 - - - - lfs1-OST002a_UUID 282350136 - 282460160 - - - - lfs1-OST002b_UUID 280608276 - 280756224 - - - - lfs1-OST002c_UUID 284707076 - 284819456 - - - - lfs1-OST002d_UUID 283481900 - 283639808 - - - - lfs1-OST002e_UUID 283875928 - 284033024 - - - - lfs1-OST002f_UUID 241472472 - 241565696 - - - - lfs1-OST0030_UUID 265680488 - 265814016 - - - - lfs1-OST0031_UUID 273082788 - 273154048 - - - - lfs1-OST0032_UUID 227469276 - 227540992 - - - - lfs1-OST0033_UUID 301283952 - 301465600 - - - - lfs1-OST0034_UUID 282200664 - 282329088 - - - - lfs1-OST0035_UUID 236723324 - 236847104 - - - - lfs1-OST0036_UUID 244762536 - 244842496 - - - - lfs1-OST0037_UUID 290396060 - 290586624 - - - - lfs1-OST0038_UUID 247838916 - 247988224 - - - - lfs1-OST0039_UUID 289519172 - 289669120 - - - - lfs1-OST003a_UUID 237240432 - 237371392 - - - - lfs1-OST003b_UUID 281243232 - 281411584 - - - - lfs1-OST003c_UUID 255128092 - 255197184 - - - - lfs1-OST003d_UUID 267801692 - 267911168 - - - - lfs1-OST003e_UUID 268842936 - 268959744 - - - - lfs1-OST003f_UUID 259531064 - 259653632 - - - - lfs1-OST0040_UUID 270812788 - 270925824 - - - - lfs1-OST0041_UUID 256516360 - 256638976 - - - - lfs1-OST0042_UUID 289071672 - 289144832 - - - - lfs1-OST0043_UUID 290809280 - 290979840 - - - - lfs1-OST0044_UUID 265673448 - 265814016 - - - - lfs1-OST0045_UUID 288587388 - 288751616 - - - - lfs1-OST0046_UUID 291939312 - 292028416 - - - - lfs1-OST0047_UUID 232700912 - 232783872 - - - - [root@Jet:fe6 ~] # lfs quota -v -g rthfip-gfdl /lfs1 Disk quotas for group rthfip-gfdl (gid 19104): Filesystem kbytes quota limit grace files quota limit grace /lfs1 0 31457280 33030144 - 0 0 0 - lfs1-MDT0000_UUID 0 - 65536 - 0 - 0 - lfs1-OST0000_UUID 0 - 65536 - - - - - lfs1-OST0001_UUID 0 - 65536 - - - - - lfs1-OST0002_UUID 0 - 65536 - - - - - lfs1-OST0003_UUID 0 - 65536 - - - - - lfs1-OST0004_UUID 0 - 65536 - - - - - lfs1-OST0005_UUID 0 - 65536 - - - - - lfs1-OST0006_UUID 0 - 65536 - - - - - lfs1-OST0007_UUID 0 - 65536 - - - - - lfs1-OST0008_UUID 0 - 65536 - - - - - lfs1-OST0009_UUID 0 - 65536 - - - - - lfs1-OST000a_UUID 0 - 65536 - - - - - lfs1-OST000b_UUID 0 - 65536 - - - - - lfs1-OST000c_UUID 0 - 65536 - - - - - lfs1-OST000d_UUID 0 - 65536 - - - - - lfs1-OST000e_UUID 0 - 65536 - - - - - lfs1-OST000f_UUID 0 - 65536 - - - - - lfs1-OST0010_UUID 0 - 65536 - - - - - lfs1-OST0011_UUID 0 - 65536 - - - - - lfs1-OST0012_UUID 0 - 65536 - - - - - lfs1-OST0013_UUID 0 - 65536 - - - - - lfs1-OST0014_UUID 0 - 65536 - - - - - lfs1-OST0015_UUID 0 - 65536 - - - - - lfs1-OST0016_UUID 0 - 65536 - - - - - lfs1-OST0017_UUID 0 - 65536 - - - - - lfs1-OST0018_UUID 0 - 65536 - - - - - lfs1-OST0019_UUID 0 - 65536 - - - - - lfs1-OST001a_UUID 0 - 65536 - - - - - lfs1-OST001b_UUID 0 - 65536 - - - - - lfs1-OST001c_UUID 0 - 65536 - - - - - lfs1-OST001d_UUID 0 - 65536 - - - - - lfs1-OST001e_UUID 0 - 65536 - - - - - lfs1-OST001f_UUID 0 - 65536 - - - - - lfs1-OST0020_UUID 0 - 65536 - - - - - lfs1-OST0021_UUID 0 - 65536 - - - - - lfs1-OST0022_UUID 0 - 65536 - - - - - lfs1-OST0023_UUID 0 - 65536 - - - - - lfs1-OST0024_UUID 0 - 65536 - - - - - lfs1-OST0025_UUID 0 - 65536 - - - - - lfs1-OST0026_UUID 0 - 65536 - - - - - lfs1-OST0027_UUID 0 - 65536 - - - - - lfs1-OST0028_UUID 0 - 65536 - - - - - lfs1-OST0029_UUID 0 - 65536 - - - - - lfs1-OST002a_UUID 0 - 65536 - - - - - lfs1-OST002b_UUID 0 - 65536 - - - - - lfs1-OST002c_UUID 0 - 65536 - - - - - lfs1-OST002d_UUID 0 - 65536 - - - - - lfs1-OST002e_UUID 0 - 65536 - - - - - lfs1-OST002f_UUID 0 - 65536 - - - - - lfs1-OST0030_UUID 0 - 65536 - - - - - lfs1-OST0031_UUID 0 - 65536 - - - - - lfs1-OST0032_UUID 0 - 65536 - - - - - lfs1-OST0033_UUID 0 - 65536 - - - - - lfs1-OST0034_UUID 0 - 65536 - - - - - lfs1-OST0035_UUID 0 - 65536 - - - - - lfs1-OST0036_UUID 0 - 65536 - - - - - lfs1-OST0037_UUID 0 - 65536 - - - - - lfs1-OST0038_UUID 0 - 65536 - - - - - lfs1-OST0039_UUID 0 - 65536 - - - - - lfs1-OST003a_UUID 0 - 65536 - - - - - lfs1-OST003b_UUID 0 - 65536 - - - - - lfs1-OST003c_UUID 0 - 65536 - - - - - lfs1-OST003d_UUID 0 - 65536 - - - - - lfs1-OST003e_UUID 0 - 65536 - - - - - lfs1-OST003f_UUID 0 - 65536 - - - - - lfs1-OST0040_UUID 0 - 65536 - - - - - lfs1-OST0041_UUID 0 - 65536 - - - - - lfs1-OST0042_UUID 0 - 65536 - - - - - lfs1-OST0043_UUID 0 - 65536 - - - - - lfs1-OST0044_UUID 0 - 65536 - - - - - lfs1-OST0045_UUID 0 - 65536 - - - - - lfs1-OST0046_UUID 0 - 65536 - - - - - lfs1-OST0047_UUID 0 - 65536 - - - - -

          Hi Niu,

          I was able to run with quota debug enabled. The debug logs are very large, but I think we were able to see which groups were having the logs messages printed. There doesn't appear to be anything strange about them:

          Disk quotas for group hfip-gfdl (gid 19103):
          Filesystem kbytes quota limit grace files quota
          limit grace
          /lfs1 18016033720 20971520000 22020096000 - 3544247 0 0 -

          Would it be possible to see whether or not the quota space was eventually acquired? Do you think I would be able to trace it by pid? Or do different threads operate on the same quota task? Is there a transaction ID or anything? I'll look at the code too to see if I can tell.

          Also I could make the logs available. They are each about ~300MB compressed, ~11GB uncompressed.

          Thanks

          kitwestneat Kit Westneat (Inactive) added a comment - Hi Niu, I was able to run with quota debug enabled. The debug logs are very large, but I think we were able to see which groups were having the logs messages printed. There doesn't appear to be anything strange about them: Disk quotas for group hfip-gfdl (gid 19103): Filesystem kbytes quota limit grace files quota limit grace /lfs1 18016033720 20971520000 22020096000 - 3544247 0 0 - Would it be possible to see whether or not the quota space was eventually acquired? Do you think I would be able to trace it by pid? Or do different threads operate on the same quota task? Is there a transaction ID or anything? I'll look at the code too to see if I can tell. Also I could make the logs available. They are each about ~300MB compressed, ~11GB uncompressed. Thanks

          If the quota acquire constantly fail like this, the io will not finished.

          Could you run "lfs quota -v uid/gid" to see what's the quota limits of the uid/gid which is causing the problem? Thanks.

          niu Niu Yawei (Inactive) added a comment - If the quota acquire constantly fail like this, the io will not finished. Could you run "lfs quota -v uid/gid" to see what's the quota limits of the uid/gid which is causing the problem? Thanks.

          People

            niu Niu Yawei (Inactive)
            orentas Oz Rentas (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: