Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3784

Quota issue on system upgraded to 2.4.x

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.4.0
    • None
    • Ubuntu
    • 3
    • 9787

    Description

      Sanger upgraded a test system to 2.4.0 and is having issues with their quota. The accounting is not working correctly. This file system was originally 1.6.x, but was upgraded to 1.8.x a while back.

      The e2fsprogs were also upgraded to 1.42.7.wc1-1.

      They ran tunefs.lustre --quota on all the OSTs and MDTs. Originally, some of the OSSes had been missed in the e2fsprogs upgrade.

      They ran e2fsck -fp and got:
      [QUOTA WARNING] Usage inconsistent for ID 0:actual (3788615680, 1231588) != expected (761856, 139)

      Running e2fsck -fy afterwards looked clean:
      e2fsck 1.42.7.wc1 (12-Apr-2013)
      Pass 1: Checking inodes, blocks, and sizes
      Pass 2: Checking directory structure
      Pass 3: Checking directory connectivity
      Pass 4: Checking reference counts
      Pass 5: Checking group summary information
      lus01-OST0000: 2140413/488366080 files (3.2% non-contiguous), 668882261/1953457152 blocks

      but still the accounting was wrong:
      root@isg-disc-mon-05:~# lfs quota -u jb23 /lustre/scratch101 -v
      Disk quotas for user jb23 (uid 12296):
      Filesystem kbytes quota limit grace files quota limit grace
      /lustre/scratch101
      0 0 1 - 0 0 1 -
      lus01-MDT0000_UUID
      0 - 0 - 0 - 0 -
      lus01-OST0000_UUID
      0 - 0 - - - - -
      lus01-OST0001_UUID
      0 - 0 - - - - -

      They had these messages in the logs:
      Aug 14 23:41:35 lus01-oss1 kernel: VFS: Quota for id 19228 referenced but not present.
      Aug 14 23:41:35 lus01-oss1 kernel: VFS: Can't read quota structure for id 19228.
      Aug 14 23:41:35 lus01-oss1 kernel: LustreError: 10738:0:(qsd_entry.c:215:qsd_refresh_usage()) $$$ failed to read disk usage, rc:-3 qsd:lus01-OST0000 qtype:usr id:19228 enforced:1 granted:0 pending:0 waiting:0 req:0 usage:0 qunit:0 qtune:0 edquot:0
      Aug 14 23:41:35 lus01-oss1 kernel: Lustre: 10738:0:(qsd_reint.c:349:qsd_reconciliation()) lus01-OST0000: failed to locate lqe. [0x200000006:0x20000:0x0], -3
      Aug 14 23:41:35 lus01-oss1 kernel: Lustre: 10738:0:(qsd_reint.c:525:qsd_reint_main()) lus01-OST0000: reconciliation failed. [0x0:0x0:0x0], -3
      Aug 15 00:03:15 lus01-oss1 kernel: EXT4-fs (dm-7): Couldn't mount because of unsupported optional features (100)

      I asked them to try clearing the quota inodes:

      root@lus01-oss1:~# umount /export/vd01
      root@lus01-oss1:~#

      { echo "clri <3>"; echo "clri <4>"; }

      | debugfs -w /dev/lus01-ost0/lus01
      debugfs 1.42.7.wc1 (12-Apr-2013)
      debugfs: clri <3>
      debugfs: clri <4>
      debugfs: root@lus0e2fsck -fy /dev/lus01-ost0/lus01
      e2fsck 1.42.7.wc1 (12-Apr-2013)
      Pass 1: Checking inodes, blocks, and sizes
      Quota inode is not regular file. Clear? yes

      Quota inode is not regular file. Clear? yes

      Pass 2: Checking directory structure
      Pass 3: Checking directory connectivity
      Pass 4: Checking reference counts
      Pass 5: Checking group summary information
      Block bitmap differences: (15481551) -(15551557) -(15601561) -(15631567) -1575 -(15821584) -(15861587) -1590 -(15921600) -(16021605) -(16101611) -(16141616) -(16181623) -1626 -(16281637) -(16391644) -(16551658) -(16601664) -(16661677) -(16791681) -(16841700) -(17041707) -(17121715) -(17281731) -(17361739) -(17451751) -(17541755) -(18561869) -(19031912) -(19141915) -(19811990) -(19922013) -4223 -(12320-12341) -12745 -12888
      Fix? yes

      Free blocks count wrong for group #0 (3327, counted=3538).
      Fix? yes

      Free blocks count wrong (1284432058, counted=1284432269).
      Fix? yes

      [ERROR] quotaio.c:246:quota_file_open:: qh_ops->check_file failed
      [ERROR] mkquota.c:543:quota_compare_and_update:: Open quota file failed
      Update quota info for quota type 0? yes

      [ERROR] quotaio.c:246:quota_file_open:: qh_ops->check_file failed
      [ERROR] mkquota.c:543:quota_compare_and_update:: Open quota file failed
      Update quota info for quota type 1? yes

      lus01-OST0000: ***** FILE SYSTEM WAS MODIFIED *****
      lus01-OST0000: 2140509/488366080 files (3.2% non-contiguous), 669025094/1953457152 blocks
      root@lus01-oss1:~#

      But still no luck. There are definitely objects allocated and in use on the OSTs:
      root@lus01-oss1:~# find /export/vd01 -uid 12296 -ls
      109 6144 rw-rw-rw 1 jb23 4294936579 6291456 Aug 15 17:33 /export/vd01/O/0/d25/72330521

      At this point I'm not sure what to try next. Any ideas on what to try next, or any debugging that can be done?

      Attachments

        Issue Links

          Activity

            People

              niu Niu Yawei (Inactive)
              kitwestneat Kit Westneat (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: