Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Lustre 2.4.0
-
None
-
Ubuntu
-
3
-
9787
Description
Sanger upgraded a test system to 2.4.0 and is having issues with their quota. The accounting is not working correctly. This file system was originally 1.6.x, but was upgraded to 1.8.x a while back.
The e2fsprogs were also upgraded to 1.42.7.wc1-1.
They ran tunefs.lustre --quota on all the OSTs and MDTs. Originally, some of the OSSes had been missed in the e2fsprogs upgrade.
They ran e2fsck -fp and got:
[QUOTA WARNING] Usage inconsistent for ID 0:actual (3788615680, 1231588) != expected (761856, 139)
Running e2fsck -fy afterwards looked clean:
e2fsck 1.42.7.wc1 (12-Apr-2013)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
lus01-OST0000: 2140413/488366080 files (3.2% non-contiguous), 668882261/1953457152 blocks
but still the accounting was wrong:
root@isg-disc-mon-05:~# lfs quota -u jb23 /lustre/scratch101 -v
Disk quotas for user jb23 (uid 12296):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
0 0 1 - 0 0 1 -
lus01-MDT0000_UUID
0 - 0 - 0 - 0 -
lus01-OST0000_UUID
0 - 0 - - - - -
lus01-OST0001_UUID
0 - 0 - - - - -
They had these messages in the logs:
Aug 14 23:41:35 lus01-oss1 kernel: VFS: Quota for id 19228 referenced but not present.
Aug 14 23:41:35 lus01-oss1 kernel: VFS: Can't read quota structure for id 19228.
Aug 14 23:41:35 lus01-oss1 kernel: LustreError: 10738:0:(qsd_entry.c:215:qsd_refresh_usage()) $$$ failed to read disk usage, rc:-3 qsd:lus01-OST0000 qtype:usr id:19228 enforced:1 granted:0 pending:0 waiting:0 req:0 usage:0 qunit:0 qtune:0 edquot:0
Aug 14 23:41:35 lus01-oss1 kernel: Lustre: 10738:0:(qsd_reint.c:349:qsd_reconciliation()) lus01-OST0000: failed to locate lqe. [0x200000006:0x20000:0x0], -3
Aug 14 23:41:35 lus01-oss1 kernel: Lustre: 10738:0:(qsd_reint.c:525:qsd_reint_main()) lus01-OST0000: reconciliation failed. [0x0:0x0:0x0], -3
Aug 15 00:03:15 lus01-oss1 kernel: EXT4-fs (dm-7): Couldn't mount because of unsupported optional features (100)
I asked them to try clearing the quota inodes:
root@lus01-oss1:~# umount /export/vd01
root@lus01-oss1:~#
| debugfs -w /dev/lus01-ost0/lus01
debugfs 1.42.7.wc1 (12-Apr-2013)
debugfs: clri <3>
debugfs: clri <4>
debugfs: root@lus0e2fsck -fy /dev/lus01-ost0/lus01
e2fsck 1.42.7.wc1 (12-Apr-2013)
Pass 1: Checking inodes, blocks, and sizes
Quota inode is not regular file. Clear? yes
Quota inode is not regular file. Clear? yes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: (15481551) -(15551557) -(15601561) -(15631567) -1575 -(15821584) -(15861587) -1590 -(15921600) -(16021605) -(16101611) -(16141616) -(16181623) -1626 -(16281637) -(16391644) -(16551658) -(16601664) -(16661677) -(16791681) -(16841700) -(17041707) -(17121715) -(17281731) -(17361739) -(17451751) -(17541755) -(18561869) -(19031912) -(19141915) -(19811990) -(19922013) -4223 -(12320-12341) -12745 -12888
Fix? yes
Free blocks count wrong for group #0 (3327, counted=3538).
Fix? yes
Free blocks count wrong (1284432058, counted=1284432269).
Fix? yes
[ERROR] quotaio.c:246:quota_file_open:: qh_ops->check_file failed
[ERROR] mkquota.c:543:quota_compare_and_update:: Open quota file failed
Update quota info for quota type 0? yes
[ERROR] quotaio.c:246:quota_file_open:: qh_ops->check_file failed
[ERROR] mkquota.c:543:quota_compare_and_update:: Open quota file failed
Update quota info for quota type 1? yes
lus01-OST0000: ***** FILE SYSTEM WAS MODIFIED *****
lus01-OST0000: 2140509/488366080 files (3.2% non-contiguous), 669025094/1953457152 blocks
root@lus01-oss1:~#
But still no luck. There are definitely objects allocated and in use on the OSTs:
root@lus01-oss1:~# find /export/vd01 -uid 12296 -ls
109 6144 rw-rw-rw 1 jb23 4294936579 6291456 Aug 15 17:33 /export/vd01/O/0/d25/72330521
At this point I'm not sure what to try next. Any ideas on what to try next, or any debugging that can be done?
Attachments
Issue Links
- is related to
-
LU-3861 Quota issues after upgrade from 2.1.4 to 2.4
-
- Resolved
-