|
Niu
Could you please advise on this one?
Thanks
Peter
|
|
Looks quota wasn't properly enabled on backend filesystem, for the OSS which missed e2fsprogs upgrade, did you re-run the lustre.tunefs --quota to enable quota after e2fsprogs upgraded? Do you have logs when server start? Thanks.
|
|
tunefs.lustre --quota was run on each of the OSS's again after we noticed that the package had not been upgraded.
At what point do you want the kernel logs for ?
|
|
Hi, James, I'd see if there is any error message during tuners.lustre -quota and server start time. Thanks. BTW: what's the kernel version of server?
|
|
We are running a kernel compiled from redhat source.
This is from my original ticket to DDN:
I have upgraded the systems to ubuntu precise and they are running ( 2.6.32-lustre-2.4 ).
We are running e2fsprogs ( 1.42.7.wc1-1 )
The quota system is not working completely….
Each of the OST's had tunefs.lustre --quota run upon it and this was also run on the MDS and MGS, for example:
tunefs.lustre --quota /dev/lus01-ostf/lus01
checking for existing Lustre data: found
Reading CONFIGS/mountdata
Read previous values:
Target: lus01-OST000f
Index: 15
Lustre FS: lus01
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.17.99.10@tcp mgsnode=172.17.99.9@tcp failover.node=172.17.99.8@tcp ost.quota_type=ug
Permanent disk data:
Target: lus01-OST000f
Index: 15
Lustre FS: lus01
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.17.99.10@tcp mgsnode=172.17.99.9@tcp failover.node=172.17.99.8@tcp ost.quota_type=ug
I note we have "ost.quota_type=ug" in the parameters and that makes me think we might need to remove that persistent option.
This is the kernel log from the MDT
lctl
lctl > get_param lus01.quota.mdt
error: get_param: /proc/
{fs,sys}/{lnet,lustre}/lus01/quota/mdt: Found no match
ls /proc/{fs,sys}
/
{lnet,lustre}
/*/quota
ls: cannot access /proc/fs/lnet/*/quota: No such file or directory
ls: cannot access /proc/fs/lustre/*/quota: No such file or directory
ls: cannot access /proc/sys/lnet/*/quota: No such file or directory
ls: cannot access /proc/sys/lustre/*/quota: No such file or directory
Aug 13 10:33:30 lus01-mds1 kernel: LustreError: 22667:0:(mgs_llog.c:2899:mgs_write_log_quota()) parameter quota.ost isn't supported (only quota.mdt & quota.ost are)
Aug 13 10:33:30 lus01-mds1 kernel: LustreError: 22667:0:(mgs_llog.c:3578:mgs_write_log_param()) err -22 on param 'quota.ost'
Aug 13 10:33:30 lus01-mds1 kernel: LustreError: 22667:0:(mgs_handler.c:941:mgs_iocontrol()) MGS: setparam err: rc = -22
Aug 13 10:35:17 lus01-mds1 kernel: LustreError: 22692:0:(mgs_llog.c:2899:mgs_write_log_quota()) parameter quota.ost isn't supported (only quota.mdt & quota.ost are)
Aug 13 10:35:17 lus01-mds1 kernel: LustreError: 22692:0:(mgs_llog.c:3578:mgs_write_log_param()) err -22 on param 'quota.ost'
Aug 13 10:35:17 lus01-mds1 kernel: LustreError: 22692:0:(mgs_handler.c:941:mgs_iocontrol()) MGS: setparam err: rc = -22
After one failed MDT mount where I managed not to notice that I had an issue with the networking, the following is the end of a kernel log on an OSS.
Aug 13 09:33:55 lus01-oss1 kernel: Lustre: lus01-OST0000: recovery is timed out, evict stale exports
Aug 13 09:33:55 lus01-oss1 kernel: Lustre: lus01-OST0000: disconnecting 1 stale clients
Aug 13 09:33:55 lus01-oss1 kernel: Lustre: lus01-OST0001: recovery is timed out, evict stale exports
Aug 13 09:33:55 lus01-oss1 kernel: Lustre: lus01-OST0001: disconnecting 1 stale clients
Aug 13 09:33:55 lus01-oss1 kernel: Lustre: lus01-OST0000: Recovery over after 5:00, of 3 clients 2 recovered and 1 was evicted.
Aug 13 09:33:55 lus01-oss1 kernel: Lustre: lus01-OST0003: Recovery over after 5:00, of 3 clients 2 recovered and 1 was evicted.
Aug 13 09:36:04 lus01-oss1 kernel: Lustre: 9744:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1376382859/real 1376382859] req@ffff880438ee5000 x1443241492742768/t0(0) o38->lus01-MDT0000-lwp-OST0000@172.17.99.10@tcp:12/10 lens 400/544 e 0 to 1 dl 1376382964 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 13 09:36:04 lus01-oss1 kernel: Lustre: 9744:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 11 previous similar messages
Aug 13 09:40:39 lus01-oss1 kernel: Lustre: 9744:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1376383134/real 1376383137] req@ffff88040276bc00 x1443241492742892/t0(0) o38->lus01-MDT0000-lwp-OST0003@172.17.99.9@tcp:12/10 lens 400/544 e 0 to 1 dl 1376383239 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 13 09:40:39 lus01-oss1 kernel: Lustre: 9744:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 22 previous similar messages
Aug 13 09:49:49 lus01-oss1 kernel: Lustre: 9744:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1376383684/real 1376383684] req@ffff8803381c5c00 x1443241492743124/t0(0) o38->lus01-MDT0000-lwp-OST0003@172.17.99.9@tcp:12/10 lens 400/544 e 0 to 1 dl 1376383789 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 13 09:49:49 lus01-oss1 kernel: Lustre: 9744:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 32 previous similar messages
Aug 13 09:50:04 lus01-oss1 kernel: Lustre: lus01-OST0000: deleting orphan objects from 0x0:72330153 to 0x0:72330274
Aug 13 09:50:04 lus01-oss1 kernel: Lustre: lus01-OST0001: deleting orphan objects from 0x0:73064826 to 0x0:73065542
Aug 13 09:50:04 lus01-oss1 kernel: Lustre: lus01-OST0006: deleting orphan objects from 0x0:71940684 to 0x0:71941216
Aug 13 09:50:04 lus01-oss1 kernel: Lustre: lus01-OST0003: deleting orphan objects from 0x0:72426685 to 0x0:72426837
Aug 13 09:50:04 lus01-oss1 kernel: Lustre: lus01-OST0004: deleting orphan objects from 0x0:71736668 to 0x0:71736925
Aug 13 09:50:04 lus01-oss1 kernel: Lustre: lus01-OST0005: deleting orphan objects from 0x0:70934781 to 0x0:70934907
Aug 13 09:50:04 lus01-oss1 kernel: Lustre: lus01-OST0002: deleting orphan objects from 0x0:72574984 to 0x0:72575499
Aug 13 10:03:12 lus01-oss1 kernel: EXT4-fs (dm-9): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:12 lus01-oss1 kernel: EXT4-fs (dm-8): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:12 lus01-oss1 kernel: EXT4-fs (dm-7): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:12 lus01-oss1 kernel: EXT4-fs (dm-6): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:12 lus01-oss1 kernel: EXT4-fs (dm-5): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:13 lus01-oss1 kernel: EXT4-fs (dm-2): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:13 lus01-oss1 kernel: EXT4-fs (dm-3): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:13 lus01-oss1 kernel: EXT4-fs (dm-1): Couldn't mount because of unsupported optional features (100)
The limits appear to be correct ( we set them all to 1 before the system was decommissioned ) however the current usage is not right.
Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.
root@isg-disc-mon-05:/lustre/scratch101/ensembl# lfs quota -u jb23 -v /lustre/scratch101
Disk quotas for user jb23 (uid 12296):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
[0] 0 1 - [0] 0 1 -
lus01-MDT0000_UUID
0 - 0 - 0 - 0 -
lus01-OST0000_UUID
0 - 0 - - - - -
...
|
|
More from the original ticket:
I note that that one of the OSS's seems to get confused about multiple mount protection…..
root@lus01-oss4:~# ps -ef |grep fsck
root 28034 22654 0 14:26 pts/1 00:00:00 grep fsck
root@lus01-oss4:~# mount /export/vd30
mount.lustre: mount /dev/mapper/lus01--ost1d-lus01 at /export/vd30 failed: Invalid argument
This may have multiple causes.
Are the mount options correct?
Check the syslog for more info.
Aug 14 14:25:35 lus01-oss4 kernel: LustreError: 28023:0:(obd_mount_server.c:1665:server_fill_super()) Unable to start osd on /dev/mapper/lus01--ost1d-lus01: -22
Aug 14 14:25:35 lus01-oss4 kernel: LustreError: 28023:0:(obd_mount.c:1267:lustre_fill_super()) Unable to mount (-22)
Aug 14 14:26:46 lus01-oss4 kernel: LDISKFS-fs warning (device dm-0): ldiskfs_multi_mount_protect: fsck is running on the filesystem
Aug 14 14:26:46 lus01-oss4 kernel: LDISKFS-fs warning (device dm-0): ldiskfs_multi_mount_protect: MMP failure info: last update time: 1376483804, last update node: lus01-oss4, last update device: /dev/lus01-ost1d/lus01
Aug 14 14:26:46 lus01-oss4 kernel:
Aug 14 14:26:46 lus01-oss4 kernel: LustreError: 28036:0:(osd_handler.c:5349:osd_mount()) lus01-OST001d-osd: can't mount /dev/mapper/lus01--ost1d-lus01: -22
….
root@lus01-oss4:~# tune2fs -f -E clear_mmp /dev/lus01-ost1d/lus01
tune2fs 1.42.7.wc1 (12-Apr-2013)
root@lus01-oss4:~# mount /export/vd30
I then noted the following message and therefore repeated the tuners.lustre --quota
Aug 14 14:28:56 lus01-oss4 kernel: LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=off. Opts:
root@lus01-oss4:~# tunefs.lustre --quota /dev/lus01-ost1d/lus01
checking for existing Lustre data: found
Reading CONFIGS/mountdata
Read previous values:
Target: lus01-OST001d
Index: 29
Lustre FS: lus01
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.17.99.10@tcp mgsnode=172.17.99.9@tcp failover.node=172.17.99.7@tcp ost.quota_type=ug
Permanent disk data:
Target: lus01-OST001d
Index: 29
Lustre FS: lus01
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.17.99.10@tcp mgsnode=172.17.99.9@tcp failover.node=172.17.99.7@tcp ost.quota_type=ug
root@lus01-oss4:~#
Now all the discs are mounted thus:
Aug 14 16:06:06 lus01-oss2 kernel: LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts:
And we still do not get quotas correctly.
|
|
And more:
That didn't help. In case it is not clear ( as it wasn't to me ) I think the system has the old quotas in place ( we did set them to 1 when we decomissioned the file system)
root@isg-disc-mon-05:~# lfs quota -u jb23 /lustre/scratch101
Disk quotas for user jb23 (uid 12296):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
0 0 1 - 0 0 1 -
root@isg-disc-mon-05:~# lfs setquota -u jb23 -I 2 -B 2 /lustre/scratch101
root@isg-disc-mon-05:~# lfs quota -u jb23 /lustre/scratch101
Disk quotas for user jb23 (uid 12296):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
0 0 2 - 0 0 2 -
root@isg-disc-mon-05:~# lfs setquota -u jb23 -I 1 -B 1 /lustre/scratch101
root@isg-disc-mon-05:~# lfs quota -u jb23 /lustre/scratch101
Disk quotas for user jb23 (uid 12296):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
0 0 1 - 0 0 1 -
root@isg-disc-mon-05:~#
the e2fsck -fp is interesting
root@lus01-oss1:~# e2fsck -fp /dev/lus01-ost0/lus01
[QUOTA WARNING] Usage inconsistent for ID 0:actual (3788615680, 1231588) != expected (761856, 139)
lus01-OST0000: Update quota info for quota type 0Project-Id-Version: e2fsprogs
Report-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>
POT-Creation-Date: 2008-06-17 22:16-0400
PO-Revision-Date: 2008-08-10 09:38+0000
Last-Translator: Jen Ockwell <jenfraggleubuntu@googlemail.com>
Language-Team: English (United Kingdom) <en_GB@li.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Launchpad-Export-Date: 2013-01-28 10:46+0000
X-Generator: Launchpad (build 16451)
.
[QUOTA WARNING] Usage inconsistent for ID 0:actual (4592496640, 1237152) != expected (761856, 139)
lus01-OST0000: Update quota info for quota type 1Project-Id-Version: e2fsprogs
Report-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>
POT-Creation-Date: 2008-06-17 22:16-0400
PO-Revision-Date: 2008-08-10 09:38+0000
Last-Translator: Jen Ockwell <jenfraggleubuntu@googlemail.com>
Language-Team: English (United Kingdom) <en_GB@li.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Launchpad-Export-Date: 2013-01-28 10:46+0000
X-Generator: Launchpad (build 16451)
.
lus01-OST0000: 2140413/488366080 files (3.2% non-contiguous), 668882260/1953457152 blocks
|
|
more:
That is interesting. I wonder how that got in the e2fsprogs... Maybe there is some localization files that got screwed up? Does the error reoccur if you rerun e2fsck? If so, can you try it with -fy?
e2fsck /dev/lus01-ost0/lus01
e2fsck 1.42.7.wc1 (12-Apr-2013)
lus01-OST0000: clean, 2140413/488366080 files, 668882263/1953457152 blocks
root@lus01-oss1:~# e2fsck -fp /dev/lus01-ost0/lus01
[QUOTA WARNING] Usage inconsistent for ID 0:actual (3788619776, 1231588) != expected (4096, 32)
lus01-OST0000: Update quota info for quota type 0Project-Id-Version: e2fsprogs
Report-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>
POT-Creation-Date: 2008-06-17 22:16-0400
PO-Revision-Date: 2008-08-10 09:38+0000
Last-Translator: Jen Ockwell <jenfraggleubuntu@googlemail.com>
Language-Team: English (United Kingdom) <en_GB@li.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Launchpad-Export-Date: 2013-01-28 10:46+0000
X-Generator: Launchpad (build 16451)
.
[QUOTA WARNING] Usage inconsistent for ID 0:actual (4592500736, 1237152) != expected (4096, 32)
lus01-OST0000: Update quota info for quota type 1Project-Id-Version: e2fsprogs
Report-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>
POT-Creation-Date: 2008-06-17 22:16-0400
PO-Revision-Date: 2008-08-10 09:38+0000
Last-Translator: Jen Ockwell <jenfraggleubuntu@googlemail.com>
Language-Team: English (United Kingdom) <en_GB@li.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Launchpad-Export-Date: 2013-01-28 10:46+0000
X-Generator: Launchpad (build 16451)
.
lus01-OST0000: 2140413/488366080 files (3.2% non-contiguous), 668882261/1953457152 blocks
root@lus01-oss1:~#
root@lus01-oss1:~#
root@lus01-oss1:~# e2fsck -fy /dev/lus01-ost0/lus01
e2fsck 1.42.7.wc1 (12-Apr-2013)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
lus01-OST0000: 2140413/488366080 files (3.2% non-contiguous), 668882261/1953457152 blocks
followed by
root@lus01-oss1:~# e2fsck -fp /dev/lus01-ost0/lus01
lus01-OST0000: 2140413/488366080 files (3.2% non-contiguous), 668882261/1953457152 blocks
Also, what does lfs quota -u root look like?
root@isg-disc-mon-05:~# lfs quota -u root /lustre/scratch101
Disk quotas for user root (uid 0):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
5 0 0 - 0 0 0 -
|
|
On 15 Aug 2013, at 09:54, Guy Coates <gmpc@sanger.ac.uk> wrote:
Hi all,
The logs on the OST are interesting at mount time; it looks like there is a corrupt quota entry
(Can't read quota structure for id 19228). I wonder if that is a fatal error for the quota subsystem.
Will a forces fsck fix that up?
Aug 14 23:40:59 lus01-oss1 kernel: LDISKFS-fs (dm-14): mounted filesystem with ordered data mode. quota=on. Opts:
Aug 14 23:40:59 lus01-oss1 kernel: Lustre: 10735:0:(ofd_dev.c:221:ofd_process_config()) For interoperability, skip this ost.quota_type. It is obsolete.
Aug 14 23:41:11 lus01-oss1 kernel: Lustre: lus01-OST0000: Will be in recovery for at least 5:00, or until 5 clients reconnect
Aug 14 23:41:35 lus01-oss1 kernel: Lustre: lus01-OST0000: Recovery over after 0:24, of 5 clients 5 recovered and 0 were evicted.
Aug 14 23:41:35 lus01-oss1 kernel: Lustre: lus01-OST0000: deleting orphan objects from 0x0:72330153 to 0x0:72330466
Aug 14 23:41:35 lus01-oss1 kernel: VFS: Quota for id 19228 referenced but not present.
Aug 14 23:41:35 lus01-oss1 kernel: VFS: Can't read quota structure for id 19228.
Aug 14 23:41:35 lus01-oss1 kernel: LustreError: 10738:0:(qsd_entry.c:215:qsd_refresh_usage()) $$$ failed to read disk usage, rc:-3 qsd:lus01-OST0000 qtype:usr id:19228 enforced:1 granted:0 pending:0 waiting:0 req:0 usage:0 qunit:0 qtune:0 edquot:0
Aug 14 23:41:35 lus01-oss1 kernel: Lustre: 10738:0:(qsd_reint.c:349:qsd_reconciliation()) lus01-OST0000: failed to locate lqe. [0x200000006:0x20000:0x0], -3
Aug 14 23:41:35 lus01-oss1 kernel: Lustre: 10738:0:(qsd_reint.c:525:qsd_reint_main()) lus01-OST0000: reconciliation failed. [0x0:0x0:0x0], -3
Aug 15 00:03:15 lus01-oss1 kernel: EXT4-fs (dm-7): Couldn't mount because of unsupported optional features (100)
Aug 15 00:03:15 lus01-oss1 kernel: EXT4-fs (dm-6): Couldn't mount because of unsupported optional features (100)
Aug 15 00:03:15 lus01-oss1 kernel: EXT4-fs (dm-5): Couldn't mount because of unsupported optional features (100)
Aug 15 00:03:15 lus01-oss1 kernel: EXT4-fs (dm-5): Couldn't mount because of unsupported optional features (100)
Aug 15 00:03:16 lus01-oss1 kernel: EXT4-fs (dm-4): Couldn't mount because of unsupported optional features (100)
Aug 15 00:03:16 lus01-oss1 kernel: EXT4-fs (dm-3): Couldn't mount because of unsupported optional features (100)
Aug 15 00:03:16 lus01-oss1 kernel: EXT4-fs (dm-2): Couldn't mount because of unsupported optional features (100)
Aug 15 00:03:16 lus01-oss1 kernel: EXT4-fs (dm-1): Couldn't mount because of unsupported optional features (100)
Aug 15 00:03:16 lus01-oss1 kernel: EXT4-fs (dm-0): Couldn't mount because of unsupported optional features (100)
Aug 15 01:03:30 lus01-oss1 kernel: EXT4-fs (dm-7): Couldn't mount because of unsupported optional features (100)
<last messages repeated indefinitely>
....
On 15 Aug 2013, at 18:44, James Beal <JAMES.BEAL@SANGER.AC.UK> wrote:
On 15 Aug 2013, at 15:55, James Beal wrote:
On 15 Aug 2013, at 15:50, Kit Westneat wrote:
Hi Guy,
I think that's the OST that James ran fsck on already, strange. If you want to try deleting the quota inodes and regenerating them, you can do this:
Thanks Kit I will try that and see what happens 
Do we need to do it on all the OST's and the MDT ?
root@lus01-oss1:~# umount /export/vd01
root@lus01-oss1:~#
{ echo "clri <3>"; echo "clri <4>"; }
| debugfs -w /dev/lus01-ost0/lus01
debugfs 1.42.7.wc1 (12-Apr-2013)
debugfs: clri <3>
debugfs: clri <4>
debugfs: root@lus0e2fsck -fy /dev/lus01-ost0/lus01
e2fsck 1.42.7.wc1 (12-Apr-2013)
Pass 1: Checking inodes, blocks, and sizes
Quota inode is not regular file. Clear? yes
Quota inode is not regular file. Clear? yes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: (15481551) -(15551557) -(15601561) -(15631567) -1575 -(15821584) -(15861587) -1590 -(15921600) -(16021605) -(16101611) -(16141616) -(16181623) -1626 -(16281637) -(16391644) -(16551658) -(16601664) -(16661677) -(16791681) -(16841700) -(17041707) -(17121715) -(17281731) -(17361739) -(17451751) -(17541755) -(18561869) -(19031912) -(19141915) -(19811990) -(19922013) -4223 -(12320-12341) -12745 -12888
Fix? yes
Free blocks count wrong for group #0 (3327, counted=3538).
Fix? yes
Free blocks count wrong (1284432058, counted=1284432269).
Fix? yes
[ERROR] quotaio.c:246:quota_file_open:: qh_ops->check_file failed
[ERROR] mkquota.c:543:quota_compare_and_update:: Open quota file failed
Update quota info for quota type 0? yes
[ERROR] quotaio.c:246:quota_file_open:: qh_ops->check_file failed
[ERROR] mkquota.c:543:quota_compare_and_update:: Open quota file failed
Update quota info for quota type 1? yes
lus01-OST0000: ***** FILE SYSTEM WAS MODIFIED *****
lus01-OST0000: 2140509/488366080 files (3.2% non-contiguous), 669025094/1953457152 blocks
root@lus01-oss1:~#
root@isg-disc-mon-05:~# lfs quota -v -u jb23 /lustre/scratch101
Disk quotas for user jb23 (uid 12296):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
3* 0 1 - 0 0 1 -
lus01-MDT0000_UUID
1 - 0 - 0 - 0 -
lus01-OST0000_UUID
0 - 0 - - - - -
lus01-OST0001_UUID
|
Aug 13 10:03:12 lus01-oss1 kernel: EXT4-fs (dm-9): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:12 lus01-oss1 kernel: EXT4-fs (dm-8): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:12 lus01-oss1 kernel: EXT4-fs (dm-7): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:12 lus01-oss1 kernel: EXT4-fs (dm-6): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:12 lus01-oss1 kernel: EXT4-fs (dm-5): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:13 lus01-oss1 kernel: EXT4-fs (dm-2): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:13 lus01-oss1 kernel: EXT4-fs (dm-3): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:13 lus01-oss1 kernel: EXT4-fs (dm-1): Couldn't mount because of unsupported optional features (100)
I don't see how these message come from, it looks like backend filesystem doesn't support quota feature (100), but I think it should be reported as "LDISKFS-fs" but not "EXT4-fs". Do you use ldiskfs as backend filesystem? Could you try to mount the OST device as ldiskfs manually to see if it can be mount properly? Thanks.
|
|
We have mounted it ldiskfs to verify that the user has objects located on it:
root@lus01-oss1:~# mount -t ldiskfs /dev/lus01-ost0/lus01 /export/vd01
root@lus01-oss1:~# find /export/vd01 -uid 12296 -ls
109 6144 rw-rw-rw 1 jb23 4294936579 6291456 Aug 15 17:33 /export/vd01/O/0/d25/72330521
|
Aug 14 23:41:35 lus01-oss1 kernel: VFS: Quota for id 19228 referenced but not present.
Aug 14 23:41:35 lus01-oss1 kernel: VFS: Can't read quota structure for id 19228.
Quota file on some OST seems corrupted, you can truncate & regenerate the quota file by:
- tune2fs -O ^quota $dev (disable quota feature, which will truncate quota files);
- tune2fs -O quota $dev (enable quota feature, which will scan all inodes and write old quota limit & quota accounting information into quota files)
After these two steps, we'd suppose that e2fsck will not report the message like "[QUOTA WARNING] Usage inconsistent for ID 0:actual (3788615680, 1231588) != expected (761856, 139)"
If everything goes well, you can try to mount lustre again to see if the problem is resolved.
We are running a kernel compiled from redhat source.
What's the kernel version?
10:03:12 lus01-oss1 kernel: EXT4-fs (dm-9): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:12 lus01-oss1 kernel: EXT4-fs (dm-8): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:12 lus01-oss1 kernel: EXT4-fs (dm-7): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:12 lus01-oss1 kernel: EXT4-fs (dm-6): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:12 lus01-oss1 kernel: EXT4-fs (dm-5): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:13 lus01-oss1 kernel: EXT4-fs (dm-2): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:13 lus01-oss1 kernel: EXT4-fs (dm-3): Couldn't mount because of unsupported optional features (100)
Aug 13 10:03:13 lus01-oss1 kernel: EXT4-fs (dm-1): Couldn't mount because of unsupported optional features (100)
Are these devices OST device? Did you mount the OST device in ext4 manually? I really want to know where these error messages come from.
|
|
The systems are connected via a network which is being upgraded and are not available today and tomorrow.
The kernel is 2.6.32-lustre-2.4 which is based off RHEL 6.4 I believe.
Yes the messages are from the OST device, we did not mount them as ext4 only as lustre or ldiskfs.
|
|
That doesn't appear to have helped.
root@lus01-oss1:~# umount /export/vd01
root@lus01-oss1:~# tune2fs -O ^quota ^C
root@lus01-oss1:~# grep vd01 /etc/fstab
/dev/lus01-ost0/lus01 /export/vd01 lustre extents,mballoc,noauto,rw 0 0
root@lus01-oss1:~# tune2fs -O ^quota /dev/lus01-ost0/lus01
tune2fs 1.42.7.wc1 (12-Apr-2013)
root@lus01-oss1:~# tune2fs -O quota /dev/lus01-ost0/lus01
tune2fs 1.42.7.wc1 (12-Apr-2013)
Warning: the quota feature is still under development
See https://ext4.wiki.kernel.org/index.php/Quota for more information
root@lus01-oss1:~#
mount /export/vd01
root@lus01-oss1:~# cat /proc/fs/lustre/obdfilter/lus01-OST0000/recovery_status
status: RECOVERING
recovery_start: 0
time_remaining: 0
connected_clients: 0/5
req_replay_clients: 0
lock_repay_clients: 0
completed_clients: 0
evicted_clients: 0
replayed_requests: 0
queued_requests: 0
next_transno: 176093659137
root@lus01-oss1:~# cat /proc/fs/lustre/obdfilter/lus01-OST0000/recovery_status
status: COMPLETE
recovery_start: 1377199282
recovery_duration: 87
completed_clients: 5/5
replayed_requests: 0
last_transno: 176093659136
VBR: DISABLED
IR: DISABLED
It doesn't appear to have helped.
jb23@isg-disc-mon-05:~$ lfs quota -v /lustre/scratch101
Disk quotas for user jb23 (uid 12296):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
3* 0 1 - 0 0 1 -
lus01-MDT0000_UUID
1 - 0 - 0 - 0 -
lus01-OST0000_UUID
0 - 0 - - - - -
lus01-OST0001_UUID
1* - 1 - - - - -
root@lus01-oss1:~# umount /export/vd01
root@lus01-oss1:~# e2fsck -fp /dev/lus01-ost0/lus01
[QUOTA WARNING] Usage inconsistent for ID 0:actual (3788619776, 1231588) != expected (0, 32)
lus01-OST0000: Update quota info for quota type 0Project-Id-Version: e2fsprogs
Report-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>
POT-Creation-Date: 2008-06-17 22:16-0400
PO-Revision-Date: 2008-08-10 09:38+0000
Last-Translator: Jen Ockwell <jenfraggleubuntu@googlemail.com>
Language-Team: English (United Kingdom) <en_GB@li.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Launchpad-Export-Date: 2013-01-28 10:46+0000
X-Generator: Launchpad (build 16451)
.
[QUOTA WARNING] Usage inconsistent for ID 0:actual (4598792192, 1237153) != expected (0, 32)
lus01-OST0000: Update quota info for quota type 1Project-Id-Version: e2fsprogs
Report-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>
POT-Creation-Date: 2008-06-17 22:16-0400
PO-Revision-Date: 2008-08-10 09:38+0000
Last-Translator: Jen Ockwell <jenfraggleubuntu@googlemail.com>
Language-Team: English (United Kingdom) <en_GB@li.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Launchpad-Export-Date: 2013-01-28 10:46+0000
X-Generator: Launchpad (build 16451)
.
lus01-OST0000: 2140514/488366080 files (3.2% non-contiguous), 669025091/1953457152 blocks
|
|
Could you paste the output of 'dumpe2fs /dev/lus01-ost0/lus01'? Thanks.
|
|
Filesystem volume name: lus01-OST0000
Last mounted on: /
Filesystem UUID: 1b59b58a-73bc-4fdf-a007-c184da2e6847
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent mmp sparse_super large_file uninit_bg quota
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 488366080
Block count: 1953457152
Reserved block count: 0
Free blocks: 1284432060
Free inodes: 486225566
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 558
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
RAID stride: 1
RAID stripe width: 8
Filesystem created: Fri Mar 19 16:07:37 2010
Last mount time: Thu Aug 22 21:10:45 2013
Last write time: Fri Aug 23 18:41:08 2013
Mount count: 1
Maximum mount count: -1
Last checked: Thu Aug 22 20:26:41 2013
Check interval: 15552000 (6 months)
Next check after: Tue Feb 18 19:26:41 2014
Lifetime writes: 36 TB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 4b551ff2-886c-43e2-abb6-02d583f4c533
Journal backup: inode blocks
MMP block number: 1546
MMP update interval: 1
User quota inode: 3
Group quota inode: 4
Journal features: journal_incompat_revoke
Journal size: 400M
Journal length: 102400
Journal sequence: 0x07b07791
Journal start: 0
|
|
Thank you, James. The output of dumpe2fs looks sane to me.
Look at the output of "e2fsck -fp /dev/lus01-ost0/lus01"
[QUOTA WARNING] Usage inconsistent for ID 0:actual (3788619776, 1231588) != expected (0, 32)
lus01-OST0000: Update quota info for quota type 0Project-Id-Version: e2fsprogs
{qutoe}
I don't see why the accounting is still not correct after 'tune2fs -O quota' (which is supposed to do quotacheck and udpate accounting), however, after e2fsck, the quota accounting should have been fixed as the message shown. Would you try to run "e2fsck -fp /dev/lus01-ost0/lus01" again to see if the quota inconsistent problem fixed? Thanks.
|
|
Same again....
root@lus01-oss1:~# e2fsck -fp /dev/lus01-ost0/lus01
[QUOTA WARNING] Usage inconsistent for ID 0:actual (3788619776, 1231588) != expected (0, 32)
lus01-OST0000: Update quota info for quota type 0Project-Id-Version: e2fsprogs
Report-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>
POT-Creation-Date: 2008-06-17 22:16-0400
PO-Revision-Date: 2008-08-10 09:38+0000
Last-Translator: Jen Ockwell <jenfraggleubuntu@googlemail.com>
Language-Team: English (United Kingdom) <en_GB@li.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Launchpad-Export-Date: 2013-01-28 10:46+0000
X-Generator: Launchpad (build 16451)
.
[QUOTA WARNING] Usage inconsistent for ID 0:actual (4598792192, 1237153) != expected (0, 32)
lus01-OST0000: Update quota info for quota type 1Project-Id-Version: e2fsprogs
Report-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>
POT-Creation-Date: 2008-06-17 22:16-0400
PO-Revision-Date: 2008-08-10 09:38+0000
Last-Translator: Jen Ockwell <jenfraggleubuntu@googlemail.com>
Language-Team: English (United Kingdom) <en_GB@li.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Launchpad-Export-Date: 2013-01-28 10:46+0000
X-Generator: Launchpad (build 16451)
.
lus01-OST0000: 2140514/488366080 files (3.2% non-contiguous), 669025091/1953457152 blocks
root@lus01-oss1:~#
|
|
hmm, it's weird, I just tried a upgrade on 1.8.9, but didn't see your problem. (actually, we have upgrade auto-test)
Could you try to repeat the "tune2fs -O ^quota" & "tune2fs -O quota" (disable then enable quota) to every mdt/ost device, then remount lustre and do "lfs quota -v" again? Please capture the log from mount mdt/ost to "lfs quota -v". Thanks.
|
|
I am starting that process, did you note that this file system was originally 1.6 ?
|
I am starting that process, did you note that this file system was originally 1.6 ?
Good point, I think we never tested quota on system upgraded from 1.6, however, it has been upgraded to 1.8 for a while, I didn't think there could be problem.
|
|
Given that each OSS is taking over an hour to run that process, I would expect it done at the end of the day. Is there anything else we can do ?
We really need to have this working before we upgrade all our systems and we want to do that so we can start using the lustre 2.4 client so we can use a modern kernel on all our clients.
|
Given that each OSS is taking over an hour to run that process, I would expect it done at the end of the day. Is there anything else we can do ?
I can't think of any other things now. Please save log for every mdt/ost.
|
|
Hi,
I have another datapoint. I ran the 2.4 upgrade procedure on a freshly formatted 1.8.8 system. I get the same symptoms. Quota accounting / enforcement is not working, even after doing tunefs.lustre --quota.
The ost & mdt both report errors when checked with e2fsck.
[QUOTA WARNING] Usage inconsistent for ID 0:actual (1204224, 200) != expected (0, 34)
Update quota info for quota type 0? yes
[QUOTA WARNING] Usage inconsistent for ID 0:actual (1204224, 200) != expected (0, 34)
Update quota info for quota type 1? yes
If I fix the errors, and do some filesystem activity on the client, and check the ost/mdt for errors, both report new errors with e2fsck.
[QUOTA WARNING] Usage inconsistent for ID 0:actual (1204224, 201) != expected (0, 32)
Update quota info for quota type 0<y>? yes
[QUOTA WARNING] Usage inconsistent for ID 0:actual (1204224, 201) != expected (0, 32)
Update quota info for quota type 1<y>? yes
However, I have not seen any of the:
EXT4-fs (dm-9): Couldn't mount because of unsupported optional features (100)
errors.
This is using the same 2.4.X kernel/binaries as on the lus01 system, so it does not rule out that we've broken our kernel/server build somehow.
|
|
Hi,
I think there might be something broken in our 2.4 server build. I get the same quota problem on a freshly formatted 2.4 filesystem and 1.8.9 client. If I try to mount using a 2.4 client, the client panics immediately!
I am going to start looking at our 2.4 build to see if we have done something silly...
Cheers,
Guy
|
|
Hi,
I've redone our 2.4 build, and quota on my test system now works correctly; both the 1.8->2.4 upgraded one, and the freshly formatted 2.4 system. (I needed a round of e2fsck / tune2fs -O ^quota / tunefs.lustre --quota / lctl conf_param to get the stats in sync.)
|
|
Good news, thank you, Guy.
|
|
A collection of log files.
|
|
An update on the state of play:
The following messages were tracked down to grub probing the lustre luns to see if there were any OS'es on them than needed to be added to grub.
Aug 15 00:03:15 lus01-oss1 kernel: EXT4-fs (dm-7): Couldn't mount because of unsupported optional features (100)
We have put new kernels in place and booted from them and run the following script on each of the luns:
#!/bin/sh
LOG="/root/`echo $1.log | sed e 's#/##g'`"
date | tee -a $LOG
echo $1 2>&1 | tee -a $LOG
tune2fs -O ^quota $1 2>&1 | tee -a $LOG
date 2>&1 | tee -a $LOG
e2fsck -fy $1 2>&1 | tee -a $LOG
date 2>&1 | tee -a $LOG
tunefs.lustre -v --quota $1 2>&1 | tee -a $LOG
date 2>&1 | tee -a $LOG
I have made a tar archive of the results and /var/log/kern.log added them to this ticket.
The following bits might be relevant:
1404 Aug 30 12:42:17 lus01-mds2 kernel: Lustre: 3243:0:(obd_config.c:1428:class_config_llog_handler()) For 1.8 interoperability, rename obd type from mds to mdt
1405 Aug 30 12:42:17 lus01-mds2 kernel: Lustre: lus01-MDT0000: used disk, loading
1406 Aug 30 12:42:17 lus01-mds2 kernel: LustreError: 3243:0:(sec_config.c:1115:sptlrpc_target_local_read_conf()) missing llog context
1407 Aug 30 12:42:17 lus01-mds2 kernel: Lustre: 3243:0:(mdt_handler.c:4945:mdt_process_config()) For interoperability, skip this mdt.quota_type. It is obsolete.
3226 Aug 30 20:58:48 lus01-oss1 kernel: LDISKFS-fs (dm-14): mounted filesystem with ordered data mode. quota=on. Opts:
3227 Aug 30 20:58:49 lus01-oss1 kernel: Lustre: 28014:0:(ofd_dev.c:221:ofd_process_config()) For interoperability, skip this ost.quota_type. It is obsolete.
42 Sep 2 09:41:35 lus01-oss1 kernel: VFS: Quota for id 19228 referenced but not present.
43 Sep 2 09:41:35 lus01-oss1 kernel: VFS: Can't read quota structure for id 19228.
44 Sep 2 09:41:35 lus01-oss1 kernel: LustreError: 8948:0:(qsd_entry.c:215:qsd_refresh_usage()) $$$ failed to read disk usage, rc:-3 qsd:lus01-OST0000 qtype:usr id:19228 enforced:1 granted:0 pending:0 waiting:0 req:0 usage:0 qunit:0 qtune:0 edquot:0
45 Sep 2 09:41:35 lus01-oss1 kernel: Lustre: 8948:0:(qsd_reint.c:349:qsd_reconciliation()) lus01-OST0000: failed to locate lqe. [0x200000006:0x20000:0x0], -3
46 Sep 2 09:41:35 lus01-oss1 kernel: Lustre: 8948:0:(qsd_reint.c:525:qsd_reint_main()) lus01-OST0000: reconciliation failed. [0x0:0x0:0x0], -3
47 Sep 2 09:43:28 lus01-oss1 kernel: VFS: Quota for id 19228 referenced but not present.
48 Sep 2 09:43:28 lus01-oss1 kernel: VFS: Can't read quota structure for id 19228.
49 Sep 2 09:43:47 lus01-oss1 kernel: VFS: Quota for id 19228 referenced but not present.
50 Sep 2 09:43:47 lus01-oss1 kernel: VFS: Can't read quota structure for id 19228.
51 Sep 2 09:43:47 lus01-oss1 kernel: VFS: Quota for id 19228 referenced but not present.
52 Sep 2 09:43:47 lus01-oss1 kernel: VFS: Can't read quota structure for id 19228.
At first it appears that nothing has changed..... However ( continued in next update ).
|
|
An experiment with a user that had no data on the system, it looks like quotas "work" for "new" users.
root@isg-disc-mon-05:~# lfs quota -u aac /lustre/scratch101
Disk quotas for user aac (uid 9052):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
0 0 1 - 0 0 1 -
root@isg-disc-mon-05:~# mkdir /lustre/scratch101/sanger/aac
root@isg-disc-mon-05:~# chown aac /lustre/scratch101/sanger/aac
root@isg-disc-mon-05:~# lfs quota -u aac /lustre/scratch101
Disk quotas for user aac (uid 9052):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
4* 0 1 - 1* 0 1 -
root@isg-disc-mon-05:~# lfs setquota -u aac /lustre/scratch101 -I 150000 -B 5T
root@isg-disc-mon-05:~# lfs quota -u aac /lustre/scratch101
Disk quotas for user aac (uid 9052):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
4 0 5368709120 - 1 0 150000 -
root@isg-disc-mon-05:~# lfs quota -u aac /lustre/scratch101 -v
Disk quotas for user aac (uid 9052):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
4 0 5368709120 - 1 0 150000 -
lus01-MDT0000_UUID
4 - 0 - 1 - 0 -
lus01-OST0000_UUID
0 - 64 - - - - -
lus01-OST0001_UUID
0 - 244 - - - - -
lus01-OST0002_UUID
0 - 80 - - - - -
lus01-OST0003_UUID
0 - 44 - - - - -
lus01-OST0004_UUID
0 - 56 - - - - -
lus01-OST0005_UUID
0 - 1856 - - - - -
lus01-OST0006_UUID
0 - 44 - - - - -
lus01-OST0007_UUID
0 - 52 - - - - -
lus01-OST0008_UUID
0 - 252 - - - - -
lus01-OST0009_UUID
0 - 132 - - - - -
lus01-OST000a_UUID
0 - 128 - - - - -
lus01-OST000b_UUID
0 - 72 - - - - -
lus01-OST000c_UUID
0 - 56 - - - - -
lus01-OST000d_UUID
0 - 168 - - - - -
lus01-OST000e_UUID
0 - 252 - - - - -
lus01-OST000f_UUID
0 - 148 - - - - -
lus01-OST0010_UUID
0 - 84 - - - - -
lus01-OST0011_UUID
0 - 132 - - - - -
lus01-OST0012_UUID
0 - 196 - - - - -
lus01-OST0013_UUID
0 - 176 - - - - -
lus01-OST0014_UUID
0 - 292 - - - - -
lus01-OST0015_UUID
0 - 72 - - - - -
lus01-OST0016_UUID
0 - 168 - - - - -
lus01-OST0017_UUID
0 - 48 - - - - -
lus01-OST0018_UUID
0 - 176 - - - - -
lus01-OST0019_UUID
0 - 60 - - - - -
lus01-OST001a_UUID
0 - 144 - - - - -
lus01-OST001b_UUID
0 - 160 - - - - -
lus01-OST001c_UUID
0 - 56 - - - - -
lus01-OST001d_UUID
0 - 68 - - - - -
root@isg-disc-mon-05:~# su - aac
isg-disc-mon-05:~> cd /lustre/scratch101/sanger/aac/
isg-disc-mon-05:/lustre/scratch101/sanger/aac> tar xvf ~jb23/linux-2.6.32-358.6.2.el6.x86_64.tar.gz
./
./config-debug
./usr/
./usr/Kconfig
./usr/initramfs_data.S
./usr/gen_init_cpio.c
./usr/.gitignore
./usr/Makefile
./config-x86_64-nodebug-rhel
./config-i686-debug
./REPORTING-BUGS
./kernel.pub
./config-framepointer
./fs/
./fs/autofs/
./fs/autofs/root.c
^C
isg-disc-mon-05:/lustre/scratch101/sanger/aac> lfs quota /lustre/scratch101
Disk quotas for user aac (uid 9052):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
16 0 5368709120 - 16 0 150000 -
Disk quotas for group hsg (gid 701):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
12 0 0 - 15 0 0 -
isg-disc-mon-05:/lustre/scratch101/sanger/aac> du -sh .
32K .
isg-disc-mon-05:/lustre/scratch101/sanger/aac> find . -print | wc -l
16
|
|
I note that access is particularly slow and the OSS discs seem to be working hard.....
From atop -l
ATOP - lus01-oss1 2013/09/02 11:38:31 ------ 3s elapsed
PRC | sys 0.18s | user 0.01s | | #proc 574 | #trun 1 | #tslpi 643 | #tslpu 60 | #zombie 0 | clones 0 | | #exit 0 |
CPU | sys 4% | user 1% | irq 0% | | idle 44% | wait 753% | | steal 0% | guest 0% | avgf 2.44GHz | avgscal 81% |
CPL | avg1 61.19 | avg5 69.38 | | avg15 73.49 | | | csw 6851 | intr 4912 | | | numcpu 8 |
MEM | tot 15.7G | free 6.3G | cache 736.9M | dirty 282.6M | buff 7.0G | | slab 826.1M | | | | |
SWP | tot 4.0G | free 4.0G | | | | | | | | vmcom 544.9M | vmlim 11.9G |
LVM | --ost6-lus01 | busy 100% | read 3 | write 232 | KiB/r 4 | | KiB/w 4 | MBr/s 0.00 | MBw/s 0.30 | avq 257.71 | avio 12.8 ms |
LVM | --ost5-lus01 | busy 100% | read 0 | write 116 | KiB/r 0 | | KiB/w 4 | MBr/s 0.00 | MBw/s 0.15 | avq 867.63 | avio 25.9 ms |
LVM | --ost4-lus01 | busy 100% | read 0 | write 123 | KiB/r 0 | | KiB/w 4 | MBr/s 0.00 | MBw/s 0.16 | avq 180.31 | avio 24.4 ms |
LVM | --ost3-lus01 | busy 100% | read 70 | write 929 | KiB/r 4 | | KiB/w 4 | MBr/s 0.09 | MBw/s 1.21 | avq 789.85 | avio 3.00 ms |
LVM | --ost2-lus01 | busy 100% | read 448 | write 195 | KiB/r 4 | | KiB/w 4 | MBr/s 0.58 | MBw/s 0.25 | avq 275.68 | avio 4.67 ms |
I can see that ls -l can be slow
root@isg-disc-mon-05:/lustre/scratch101/sanger/jb23/delete/AutoFACT/pathways/cps# time ls -l
total 13504
rw-rr- 1 maa pathdev 13986 2011-02-14 18:49 cps00010.html
rw-rr- 1 maa pathdev 50127 2011-02-04 18:42 cps00010.png
rw-rw-r- 1 maa pathdev 13256 2010-12-28 03:32 cps00020.html
rw-rr- 1 maa pathdev 45360 2010-06-25 18:13 cps00020.png
rw-rr- 1 maa pathdev 13513 2010-12-28 03:32 cps00030.html
rw-rr- 1 maa pathdev 52074 2011-01-05 18:13 cps00030.png
rw-rr- 1 maa pathdev 12971 2011-02-14 18:49 cps00040.html
rw-rr- 1 maa pathdev 72999 2010-12-27 20:07 cps00040.png
rw-rr- 1 maa pathdev 12468 2011-03-08 19:16 cps00051.html
rw-rr- 1 maa pathdev 62620 2011-03-08 19:19 cps00051.png
rw-rr- 1 maa pathdev 11351 2010-12-28 03:32 cps00052.html
rw-rr- 1 maa pathdev 57643 2010-12-27 21:47 cps00052.png
rw-rr- 1 maa pathdev 11512 2011-02-14 18:52 cps00053.html
rw-rr- 1 maa pathdev 61274 2010-12-27 23:03 cps00053.png
rw-rw-r- 1 maa pathdev 15924 2010-12-28 03:32 cps00061.html
rw-rr- 1 maa pathdev 79674 2010-07-14 18:21 cps00061.png
rw-rr- 1 maa pathdev 17994 2011-02-24 19:55 cps00071.html
rw-rr- 1 maa pathdev 69633 2011-02-24 19:57 cps00071.png
rw-rw-r- 1 maa pathdev 6861 2010-12-28 03:32 cps00072.html
rw-rr- 1 maa pathdev 17104 2010-03-23 17:49 cps00072.png
rw-rw-r- 1 maa pathdev 16199 2010-12-28 03:32 cps00130.html
rw-rr- 1 maa pathdev 83147 2011-01-31 18:49 cps00130.png
rw-rr- 1 maa pathdev 11724 2011-01-11 18:04 cps00190.html
rw-rr- 1 maa pathdev 154952 2010-11-16 19:49 cps00190.png
rw-rr- 1 maa pathdev 27196 2011-03-11 19:31 cps00230.html
rw-rr- 1 maa pathdev 119322 2011-03-11 19:35 cps00230.png
rw-rr- 1 maa pathdev 19599 2011-03-11 21:18 cps00240.html
rw-rr- 1 maa pathdev 78727 2011-03-11 21:21 cps00240.png
rw-rr- 1 maa pathdev 14025 2011-03-24 18:57 cps00250.html
rw-rr- 1 maa pathdev 66562 2011-03-24 18:59 cps00250.png
rw-rr- 1 maa pathdev 17179 2011-01-12 18:28 cps00260.html
rw-rr- 1 maa pathdev 80375 2011-02-02 19:10 cps00260.png
rw-rr- 1 maa pathdev 17452 2011-01-26 20:01 cps00270.html
rw-rr- 1 maa pathdev 88111 2011-01-26 20:03 cps00270.png
rw-rr- 1 maa pathdev 16385 2011-02-14 18:56 cps00280.html
rw-rr- 1 maa pathdev 63738 2010-09-30 23:17 cps00280.png
rw-rw-r- 1 maa pathdev 8869 2010-12-28 03:32 cps00281.html
rw-rr- 1 maa pathdev 30420 2010-06-04 18:01 cps00281.png
rw-rw-r- 1 maa pathdev 12504 2010-12-28 03:32 cps00290.html
rw-rr- 1 maa pathdev 44690 2010-11-17 19:11 cps00290.png
rw-rw-r- 1 maa pathdev 11687 2010-12-28 03:32 cps00300.html
rw-rr- 1 maa pathdev 48016 2011-01-14 18:36 cps00300.png
rw-rr- 1 maa pathdev 12442 2011-02-14 18:57 cps00310.html
rw-rr- 1 maa pathdev 60245 2010-11-18 19:25 cps00310.png
rw-rr- 1 maa pathdev 8342 2011-02-02 20:14 cps00311.html
rw-rr- 1 maa pathdev 29024 2009-10-14 17:05 cps00311.png
rw-rr- 1 maa pathdev 22023 2011-03-24 20:30 cps00330.html
rw-rr- 1 maa pathdev 123495 2011-03-24 20:32 cps00330.png
rw-rr- 1 maa pathdev 12716 2011-02-14 18:59 cps00340.html
rw-rr- 1 maa pathdev 52095 2011-02-14 19:01 cps00340.png
rw-rw-r- 1 maa pathdev 16240 2010-12-28 03:32 cps00350.html
rw-rr- 1 maa pathdev 100076 2011-01-26 23:21 cps00350.png
rw-rw-r- 1 maa pathdev 14756 2010-12-28 03:32 cps00360.html
rw-rr- 1 maa pathdev 85562 2011-01-27 00:34 cps00360.png
rw-rw-r- 1 maa pathdev 16158 2010-12-28 03:32 cps00361.html
rw-rw-r- 1 maa pathdev 108844 2010-12-27 10:28 cps00361.png
rw-rw-r- 1 maa pathdev 14764 2010-12-28 03:32 cps00362.html
rw-rw-r- 1 maa pathdev 90466 2010-12-27 10:33 cps00362.png
rw-rw-r- 1 maa pathdev 9556 2010-12-28 03:32 cps00364.html
rw-rr- 1 maa pathdev 46533 2011-01-17 18:32 cps00364.png
rw-rr- 1 maa pathdev 16568 2011-02-14 19:15 cps00380.html
rw-rr- 1 maa pathdev 97498 2010-12-17 19:11 cps00380.png
rw-rw-r- 1 maa pathdev 13288 2010-12-28 03:32 cps00400.html
rw-rr- 1 maa pathdev 60318 2011-02-02 20:58 cps00400.png
rw-rw-r- 1 maa pathdev 10543 2010-12-28 03:32 cps00401.html
rw-rr- 1 maa pathdev 63502 2010-12-02 02:04 cps00401.png
rw-rr- 1 maa pathdev 10618 2011-02-14 19:16 cps00410.html
rw-rw-r- 1 maa pathdev 43560 2010-12-27 11:07 cps00410.png
rw-rw-r- 1 maa pathdev 8542 2010-12-28 03:32 cps00430.html
rw-rr- 1 maa pathdev 27838 2010-06-15 19:53 cps00430.png
rw-rw-r- 1 maa pathdev 11738 2010-12-28 03:32 cps00440.html
rw-rw-r- 1 maa pathdev 57870 2010-12-27 11:19 cps00440.png
rw-rr- 1 maa pathdev 9499 2011-03-11 22:34 cps00450.html
rw-rr- 1 maa pathdev 34578 2011-03-11 22:37 cps00450.png
rw-rr- 1 maa pathdev 10673 2011-02-23 19:34 cps00460.html
rw-rr- 1 maa pathdev 42915 2011-02-23 19:37 cps00460.png
rw-rr- 1 maa pathdev 7707 2011-03-14 19:25 cps00471.html
rw-rr- 1 maa pathdev 21167 2011-03-14 19:27 cps00471.png
rw-rw-r- 1 maa pathdev 6790 2010-12-28 03:32 cps00473.html
rw-rr- 1 maa pathdev 17756 2010-10-29 18:57 cps00473.png
rw-rr- 1 maa pathdev 12633 2011-02-02 21:55 cps00480.html
rw-rw-r- 1 maa pathdev 62834 2010-03-01 19:13 cps00480.png
rw-rw-r- 1 maa pathdev 14395 2010-12-28 03:32 cps00500.html
rw-rw-r- 1 maa pathdev 74275 2010-12-27 11:58 cps00500.png
rw-rw-r- 1 maa pathdev 6121 2010-12-28 03:32 cps00511.html
rw-rr- 1 maa pathdev 19740 2011-02-02 22:10 cps00511.png
rw-rw-r- 1 maa pathdev 22089 2010-12-28 03:32 cps00520.html
rw-rw-r- 1 maa pathdev 132722 2010-12-27 12:13 cps00520.png
rw-rw-r- 1 maa pathdev 9778 2010-12-28 03:32 cps00521.html
rw-rw-r- 1 maa pathdev 35088 2010-12-27 12:21 cps00521.png
rw-rw-r- 1 maa pathdev 9958 2010-12-28 04:52 cps00540.html
rw-rr- 1 maa pathdev 67578 2009-11-16 17:22 cps00540.png
rw-rw-r- 1 maa pathdev 13857 2010-12-28 04:56 cps00550.html
rw-rr- 1 maa pathdev 58310 2009-08-29 04:33 cps00550.png
rw-rr- 1 maa pathdev 10094 2011-03-23 20:14 cps00561.html
rw-rr- 1 maa pathdev 52766 2011-03-23 20:15 cps00561.png
rw-rw-r- 1 maa pathdev 10986 2010-12-28 05:08 cps00562.html
rw-rr- 1 maa pathdev 53760 2010-11-01 18:45 cps00562.png
rw-rr- 1 maa pathdev 14103 2011-03-07 19:44 cps00564.html
rw-rr- 1 maa pathdev 77876 2011-03-23 22:04 cps00564.png
rw-rr- 1 maa pathdev 13757 2011-03-24 22:22 cps00590.html
rw-rr- 1 maa pathdev 60212 2011-03-24 22:24 cps00590.png
rw-rr- 1 maa pathdev 10019 2011-03-07 22:24 cps00592.html
rw-rr- 1 maa pathdev 38364 2011-03-22 20:38 cps00592.png
rw-rw-r- 1 maa pathdev 8935 2010-12-28 05:32 cps00600.html
rw-rr- 1 maa pathdev 46473 2009-12-18 17:23 cps00600.png
rw-rr- 1 maa pathdev 13356 2011-02-14 19:30 cps00620.html
rw-rr- 1 maa pathdev 63303 2011-01-25 19:48 cps00620.png
rw-rw-r- 1 maa pathdev 11156 2010-12-28 05:51 cps00623.html
rw-rw-r- 1 maa pathdev 63744 2010-12-28 05:51 cps00623.png
rw-rr- 1 maa pathdev 11350 2011-02-14 19:30 cps00625.html
rw-rr- 1 maa pathdev 44536 2011-01-17 19:04 cps00625.png
rw-rr- 1 maa pathdev 13772 2011-02-24 22:22 cps00626.html
rw-rr- 1 maa pathdev 75065 2011-02-24 22:23 cps00626.png
rw-rw-r- 1 maa pathdev 14594 2010-12-28 06:09 cps00627.html
rw-rr- 1 maa pathdev 89697 2010-12-17 20:27 cps00627.png
rw-rr- 1 maa pathdev 14234 2011-01-31 21:36 cps00630.html
rw-rr- 1 maa pathdev 71459 2011-01-31 21:38 cps00630.png
rw-rw-r- 1 maa pathdev 8905 2010-12-28 06:21 cps00633.html
rw-rr- 1 maa pathdev 40955 2010-12-02 06:42 cps00633.png
rw-rr- 1 maa pathdev 12598 2011-02-14 19:34 cps00640.html
rw-rr- 1 maa pathdev 55499 2011-01-11 18:25 cps00640.png
rw-rr- 1 maa pathdev 7988 2011-02-24 22:59 cps00642.html
rw-rr- 1 maa pathdev 27490 2011-02-24 23:02 cps00642.png
rw-rr- 1 maa pathdev 13229 2011-02-25 19:25 cps00650.html
rw-rr- 1 maa pathdev 56327 2011-02-25 19:26 cps00650.png
rw-rw-r- 1 maa pathdev 9932 2010-12-28 06:43 cps00660.html
rw-rw-r- 1 maa pathdev 40183 2010-03-02 03:26 cps00660.png
rw-rw-r- 1 maa pathdev 8296 2010-12-28 06:48 cps00670.html
rw-rw-r- 1 maa pathdev 25657 2010-12-28 06:48 cps00670.png
rw-rr- 1 maa pathdev 25292 2011-03-07 23:17 cps00680.html
rw-rr- 1 maa pathdev 139869 2011-03-07 23:19 cps00680.png
rw-rr- 1 maa pathdev 15911 2011-03-03 19:53 cps00720.html
rw-rr- 1 maa pathdev 63168 2011-03-03 19:55 cps00720.png
rw-rr- 1 maa pathdev 10212 2011-03-12 00:07 cps00730.html
rw-rr- 1 maa pathdev 39113 2011-03-12 00:08 cps00730.png
rw-rr- 1 maa pathdev 9414 2011-03-18 19:31 cps00740.html
rw-rr- 1 maa pathdev 34663 2011-03-22 21:36 cps00740.png
rw-rw-r- 1 maa pathdev 10528 2010-12-28 07:17 cps00750.html
rw-rr- 1 maa pathdev 47588 2010-01-08 17:59 cps00750.png
rw-rw-r- 1 maa pathdev 12512 2010-12-28 07:23 cps00760.html
rw-rw-r- 1 maa pathdev 62140 2010-12-28 07:23 cps00760.png
rw-rw-r- 1 maa pathdev 10487 2010-12-28 07:30 cps00770.html
rw-rr- 1 maa pathdev 38306 2010-06-24 20:03 cps00770.png
rw-rw-r- 1 maa pathdev 7288 2010-12-28 07:36 cps00780.html
rw-rr- 1 maa pathdev 18189 2011-02-02 23:00 cps00780.png
rw-rw-r- 1 maa pathdev 6331 2010-12-28 07:41 cps00785.html
rw-rr- 1 maa pathdev 16647 2009-10-01 17:36 cps00785.png
rw-rw-r- 1 maa pathdev 11602 2010-12-28 07:46 cps00790.html
rw-rr- 1 maa pathdev 49292 2010-10-07 19:34 cps00790.png
rw-rr- 1 maa pathdev 22626 2011-03-18 21:08 cps00860.html
rw-rr- 1 maa pathdev 131502 2011-03-18 21:11 cps00860.png
rw-rw-r- 1 maa pathdev 13293 2010-12-28 08:04 cps00900.html
rw-rr- 1 maa pathdev 64827 2010-11-12 18:37 cps00900.png
rw-rr- 1 maa pathdev 13428 2011-02-18 19:40 cps00903.html
rw-rr- 1 maa pathdev 73934 2011-02-18 19:41 cps00903.png
rw-rw-r- 1 maa pathdev 11168 2010-12-28 08:19 cps00910.html
rw-rr- 1 maa pathdev 51100 2010-10-14 20:28 cps00910.png
rw-rr- 1 maa pathdev 8556 2011-02-09 00:26 cps00920.html
rw-rw-r- 1 maa pathdev 33161 2010-12-28 08:26 cps00920.png
rw-rr- 1 maa pathdev 8270 2011-02-14 19:44 cps00930.html
rw-rr- 1 maa pathdev 35600 2011-02-14 19:47 cps00930.png
rw-rr- 1 maa pathdev 17936 2011-01-25 21:15 cps00970.html
rw-rr- 1 maa pathdev 82911 2010-12-02 19:45 cps00970.png
rw-rr- 1 maa pathdev 20928 2011-02-18 19:42 cps01040.html
rw-rr- 1 maa pathdev 62125 2011-02-18 19:43 cps01040.png
rw-rr- 1 maa pathdev 1340513 2011-03-25 21:03 cps01100.html
rw-rr- 1 maa pathdev 1359025 2011-03-25 21:19 cps01100.png
rw-rr- 1 maa pathdev 667587 2011-03-26 04:08 cps01110.html
rw-rr- 1 maa pathdev 663129 2011-03-26 04:12 cps01110.png
rw-rr- 1 maa pathdev 510209 2011-03-12 08:29 cps01120.html
rw-rr- 1 maa pathdev 548705 2011-03-12 08:33 cps01120.png
rw-rr- 1 maa pathdev 21560 2011-03-26 07:08 cps02010.html
rw-rr- 1 maa pathdev 249270 2011-03-26 07:10 cps02010.png
rw-rr- 1 maa pathdev 17142 2011-02-03 01:48 cps02020.html
rw-rr- 1 maa pathdev 212533 2010-10-01 10:31 cps02020.png
rw-rr- 1 maa pathdev 8347 2011-01-12 20:56 cps02030.html
rw-rr- 1 maa pathdev 21835 2010-07-07 22:18 cps02030.png
rw-rw-r- 1 maa pathdev 8924 2010-12-28 11:40 cps02040.html
rw-rr- 1 maa pathdev 57726 2010-06-23 20:58 cps02040.png
rw-rw-r- 1 maa pathdev 12565 2010-12-28 11:43 cps02060.html
rw-rw-r- 1 maa pathdev 59537 2010-03-02 18:04 cps02060.png
rw-rw-r- 1 maa pathdev 11918 2010-12-28 11:47 cps03010.html
rw-rr- 1 maa pathdev 93039 2010-12-07 09:08 cps03010.png
rw-rr- 1 maa pathdev 8252 2011-01-24 20:30 cps03018.html
rw-rr- 1 maa pathdev 81469 2011-01-24 20:34 cps03018.png
rw-rr- 1 maa pathdev 5534 2011-03-26 08:04 cps03020.html
rw-rr- 1 maa pathdev 78209 2010-06-16 19:04 cps03020.png
rw-rw-r- 1 maa pathdev 8018 2010-12-28 12:06 cps03030.html
rw-rr- 1 maa pathdev 118841 2010-10-21 01:17 cps03030.png
rw-rw-r- 1 maa pathdev 7925 2010-12-28 12:15 cps03060.html
rw-rr- 1 maa pathdev 167934 2010-11-30 22:08 cps03060.png
rw-rw-r- 1 maa pathdev 9446 2010-12-28 12:21 cps03070.html
rw-rr- 1 maa pathdev 101663 2010-04-14 20:30 cps03070.png
rw-rw-r- 1 maa pathdev 7753 2010-12-28 12:27 cps03410.html
rw-rr- 1 maa pathdev 76010 2010-05-27 18:48 cps03410.png
rw-rw-r- 1 maa pathdev 7529 2010-12-28 12:33 cps03420.html
rw-rr- 1 maa pathdev 88892 2010-05-27 19:19 cps03420.png
rw-rw-r- 1 maa pathdev 8329 2010-12-28 12:40 cps03430.html
rw-rr- 1 maa pathdev 52596 2010-10-21 02:31 cps03430.png
rw-rw-r- 1 maa pathdev 8661 2010-12-28 12:45 cps03440.html
rw-rr- 1 maa pathdev 79622 2010-10-21 03:42 cps03440.png
rw-rw-r- 1 maa pathdev 10663 2010-12-28 12:56 cps04122.html
rw-rr- 1 maa pathdev 61935 2010-11-06 07:07 cps04122.png
rw-rr- 1 maa pathdev 28782 2011-03-26 10:39 cps_gene_map.tab
rw-rw-r- 1 maa pathdev 352465 2011-03-27 00:20 cps.list
real 4m41.294s
user 0m0.000s
sys 0m0.184s
And the client under strace shows. I tried mounting the client with noacl but that made no change.
getxattr("asa/asa00900.html", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat("asa/asa02040.png",
{st_mode=S_IFREG|0644, st_size=57726, ...}
) = 0
lgetxattr("asa/asa02040.png", "security.selinux", "", 255) = 0
getxattr("asa/asa02040.png", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
lstat("asa/asa03440.html",
{st_mode=S_IFREG|0664, st_size=8426, ...}
) = 0
lgetxattr("asa/asa03440.html", "security.selinux", "", 255) = 0
Looking at the oid scrub.
cat ./osd-ldiskfs/lus01-MDT0000/oi_scrub
name: OI_scrub
magic: 0x4c5fd252
oi_files: 64
status: completed
flags:
param:
time_since_last_completed: 1731287 seconds
time_since_latest_start: 1732086 seconds
time_since_last_checkpoint: 1731287 seconds
latest_start_position: 329680457
last_checkpoint_position: 1050017793
first_failure_position: N/A
checked: 16323841
updated: 16323820
failed: 0
prior_updated: 0
noscrub: 0
igif: 0
success_count: 1
run_time: 1003 seconds
average_speed: 16275 objects/sec
real-time_speed: N/A
current_position: N/A
While on the OSS we have
/proc/fs/lustre/osd-ldiskfs/lus01-OST0000/oi_scrub
name: OI_scrub
magic: 0x4c5fd252
oi_files: 64
status: init
flags:
param:
time_since_last_completed: N/A
time_since_latest_start: N/A
time_since_last_checkpoint: N/A
latest_start_position: N/A
last_checkpoint_position: N/A
first_failure_position: N/A
checked: 0
updated: 0
failed: 0
prior_updated: 0
noscrub: 0
igif: 0
success_count: 0
run_time: 0 seconds
average_speed: 0 objects/sec
real-time_speed: N/A
current_position: N/A
grep -i status /proc/fs/lustre/osd-ldiskfs/*/oi_scrub
/proc/fs/lustre/osd-ldiskfs/lus01-OST0000/oi_scrub:status: init
/proc/fs/lustre/osd-ldiskfs/lus01-OST0001/oi_scrub:status: init
/proc/fs/lustre/osd-ldiskfs/lus01-OST0002/oi_scrub:status: init
/proc/fs/lustre/osd-ldiskfs/lus01-OST0003/oi_scrub:status: init
/proc/fs/lustre/osd-ldiskfs/lus01-OST0004/oi_scrub:status: init
/proc/fs/lustre/osd-ldiskfs/lus01-OST0005/oi_scrub:status: init
/proc/fs/lustre/osd-ldiskfs/lus01-OST0006/oi_scrub:status: init
|
|
Here I have deleted all the files ( I believe for my userid jb23 )
jb23@isg-disc-mon-05:/lustre/scratch101/sanger/jb23$ mkdir test_dir
jb23@isg-disc-mon-05:/lustre/scratch101/sanger/jb23$ lfs setstripe test_dir -c -1
jb23@isg-disc-mon-05:/lustre/scratch101/sanger/jb23$ lfs quota /lustre/scratch101
Disk quotas for user jb23 (uid 12296):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
8 0 5368709120 - 1 0 1500000 -
Disk quotas for group team94 (gid 1105):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
8 0 0 - 1 0 0 -
jb23@isg-disc-mon-05:/lustre/scratch101/sanger/jb23$ dd if=/dev/zero of=test_dir/deleteme
^C384692+0 records in
384692+0 records out
196962304 bytes (197 MB) copied, 3.67066 s, 53.7 MB/s
jb23@isg-disc-mon-05:/lustre/scratch101/sanger/jb23$ lfs quota /lustre/scratch101
Disk quotas for user jb23 (uid 12296):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
161832 0 5368709120 - 2 0 1500000 -
Disk quotas for group team94 (gid 1105):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
161832 0 0 - 2 0 0 -
jb23@isg-disc-mon-05:/lustre/scratch101/sanger/jb23$ ls -l ./test_dir/deleteme
rw-rr- 1 jb23 team94 196962304 2013-09-02 12:29 ./test_dir/deleteme
jb23@isg-disc-mon-05:/lustre/scratch101/sanger/jb23$ lfs quota /lustre/scratch101 -v
Disk quotas for user jb23 (uid 12296):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
192400 0 5368709120 - 2 0 1500000 -
lus01-MDT0000_UUID
8 - 0 - 2 - 0 -
lus01-OST0000_UUID
7168 - 67108864 - - - - -
lus01-OST0001_UUID
7168 - 67108864 - - - - -
lus01-OST0002_UUID
6148 - 67108864 - - - - -
lus01-OST0003_UUID
6148 - 67108864 - - - - -
lus01-OST0004_UUID
6144 - 67108864 - - - - -
lus01-OST0005_UUID
6144 - 67108864 - - - - -
lus01-OST0006_UUID
6144 - 67108864 - - - - -
lus01-OST0007_UUID
7168 - 67108864 - - - - -
lus01-OST0008_UUID
7172 - 67108864 - - - - -
lus01-OST0009_UUID
7008 - 67108864 - - - - -
lus01-OST000a_UUID
6144 - 67108864 - - - - -
lus01-OST000b_UUID
6144 - 67108864 - - - - -
lus01-OST000c_UUID
6148 - 67108864 - - - - -
lus01-OST000d_UUID
6148 - 67108864 - - - - -
lus01-OST000e_UUID
6144 - 67108864 - - - - -
lus01-OST000f_UUID
7168 - 67108864 - - - - -
lus01-OST0010_UUID
6148 - 67108864 - - - - -
lus01-OST0011_UUID
6144 - 67108864 - - - - -
lus01-OST0012_UUID
6144 - 67108864 - - - - -
lus01-OST0013_UUID
6144 - 67108864 - - - - -
lus01-OST0014_UUID
6144 - 67108864 - - - - -
lus01-OST0015_UUID
6144 - 67108864 - - - - -
lus01-OST0016_UUID
7168 - 67108864 - - - - -
lus01-OST0017_UUID
7172 - 67108864 - - - - -
lus01-OST0018_UUID
6144 - 67108864 - - - - -
lus01-OST0019_UUID
6148 - 67108864 - - - - -
lus01-OST001a_UUID
6144 - 67108864 - - - - -
lus01-OST001b_UUID
6148 - 67108864 - - - - -
lus01-OST001c_UUID
6148 - 67108864 - - - - -
lus01-OST001d_UUID
6144 - 67108864 - - - - -
Disk quotas for group team94 (gid 1105):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
192400 0 0 - 2 0 0 -
lus01-MDT0000_UUID
8 - 0 - 2 - 0 -
lus01-OST0000_UUID
7168 - 0 - - - - -
lus01-OST0001_UUID
7168 - 0 - - - - -
lus01-OST0002_UUID
6148 - 0 - - - - -
lus01-OST0003_UUID
6148 - 0 - - - - -
lus01-OST0004_UUID
6144 - 0 - - - - -
lus01-OST0005_UUID
6144 - 0 - - - - -
lus01-OST0006_UUID
6144 - 0 - - - - -
lus01-OST0007_UUID
7168 - 0 - - - - -
lus01-OST0008_UUID
7172 - 0 - - - - -
lus01-OST0009_UUID
7008 - 0 - - - - -
lus01-OST000a_UUID
6144 - 0 - - - - -
lus01-OST000b_UUID
6144 - 0 - - - - -
lus01-OST000c_UUID
6148 - 0 - - - - -
lus01-OST000d_UUID
6148 - 0 - - - - -
lus01-OST000e_UUID
6144 - 0 - - - - -
lus01-OST000f_UUID
7168 - 0 - - - - -
lus01-OST0010_UUID
6148 - 0 - - - - -
lus01-OST0011_UUID
6144 - 0 - - - - -
lus01-OST0012_UUID
6144 - 0 - - - - -
lus01-OST0013_UUID
6144 - 0 - - - - -
lus01-OST0014_UUID
6144 - 0 - - - - -
lus01-OST0015_UUID
6144 - 0 - - - - -
lus01-OST0016_UUID
7168 - 0 - - - - -
lus01-OST0017_UUID
7172 - 0 - - - - -
lus01-OST0018_UUID
6144 - 0 - - - - -
lus01-OST0019_UUID
6148 - 0 - - - - -
lus01-OST001a_UUID
6144 - 0 - - - - -
lus01-OST001b_UUID
6148 - 0 - - - - -
lus01-OST001c_UUID
6148 - 0 - - - - -
lus01-OST001d_UUID
6144 - 0 - - - - -
I then wait a bit and the quota command gives the "right" answer....
jb23@isg-disc-mon-05:/lustre/scratch101/sanger/jb23$ lfs quota /lustre/scratch101 -v
Disk quotas for user jb23 (uid 12296):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
192400 0 5368709120 - 2 0 1500000 -
lus01-MDT0000_UUID
8 - 0 - 2 - 0 -
lus01-OST0000_UUID
7168 - 67108864 - - - - -
lus01-OST0001_UUID
7168 - 67108864 - - - - -
lus01-OST0002_UUID
6148 - 67108864 - - - - -
lus01-OST0003_UUID
6148 - 67108864 - - - - -
lus01-OST0004_UUID
6144 - 67108864 - - - - -
lus01-OST0005_UUID
6144 - 67108864 - - - - -
lus01-OST0006_UUID
6144 - 67108864 - - - - -
lus01-OST0007_UUID
7168 - 67108864 - - - - -
lus01-OST0008_UUID
7172 - 67108864 - - - - -
lus01-OST0009_UUID
7008 - 67108864 - - - - -
lus01-OST000a_UUID
6144 - 67108864 - - - - -
lus01-OST000b_UUID
6144 - 67108864 - - - - -
lus01-OST000c_UUID
6148 - 67108864 - - - - -
lus01-OST000d_UUID
6148 - 67108864 - - - - -
lus01-OST000e_UUID
6144 - 67108864 - - - - -
lus01-OST000f_UUID
7168 - 67108864 - - - - -
lus01-OST0010_UUID
6148 - 67108864 - - - - -
lus01-OST0011_UUID
6144 - 67108864 - - - - -
lus01-OST0012_UUID
6144 - 67108864 - - - - -
lus01-OST0013_UUID
6144 - 67108864 - - - - -
lus01-OST0014_UUID
6144 - 67108864 - - - - -
lus01-OST0015_UUID
6144 - 67108864 - - - - -
lus01-OST0016_UUID
7168 - 67108864 - - - - -
lus01-OST0017_UUID
7172 - 67108864 - - - - -
lus01-OST0018_UUID
6144 - 67108864 - - - - -
lus01-OST0019_UUID
6148 - 67108864 - - - - -
lus01-OST001a_UUID
6144 - 67108864 - - - - -
lus01-OST001b_UUID
6148 - 67108864 - - - - -
lus01-OST001c_UUID
6148 - 67108864 - - - - -
lus01-OST001d_UUID
6144 - 67108864 - - - - -
Disk quotas for group team94 (gid 1105):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
192400 0 0 - 2 0 0 -
lus01-MDT0000_UUID
8 - 0 - 2 - 0 -
lus01-OST0000_UUID
7168 - 0 - - - - -
lus01-OST0001_UUID
7168 - 0 - - - - -
lus01-OST0002_UUID
6148 - 0 - - - - -
lus01-OST0003_UUID
6148 - 0 - - - - -
lus01-OST0004_UUID
6144 - 0 - - - - -
lus01-OST0005_UUID
6144 - 0 - - - - -
lus01-OST0006_UUID
6144 - 0 - - - - -
lus01-OST0007_UUID
7168 - 0 - - - - -
lus01-OST0008_UUID
7172 - 0 - - - - -
lus01-OST0009_UUID
7008 - 0 - - - - -
lus01-OST000a_UUID
6144 - 0 - - - - -
lus01-OST000b_UUID
6144 - 0 - - - - -
lus01-OST000c_UUID
6148 - 0 - - - - -
lus01-OST000d_UUID
6148 - 0 - - - - -
lus01-OST000e_UUID
6144 - 0 - - - - -
lus01-OST000f_UUID
7168 - 0 - - - - -
lus01-OST0010_UUID
6148 - 0 - - - - -
lus01-OST0011_UUID
6144 - 0 - - - - -
lus01-OST0012_UUID
6144 - 0 - - - - -
lus01-OST0013_UUID
6144 - 0 - - - - -
lus01-OST0014_UUID
6144 - 0 - - - - -
lus01-OST0015_UUID
6144 - 0 - - - - -
lus01-OST0016_UUID
7168 - 0 - - - - -
lus01-OST0017_UUID
7172 - 0 - - - - -
lus01-OST0018_UUID
6144 - 0 - - - - -
lus01-OST0019_UUID
6148 - 0 - - - - -
lus01-OST001a_UUID
6144 - 0 - - - - -
lus01-OST001b_UUID
6148 - 0 - - - - -
lus01-OST001c_UUID
6148 - 0 - - - - -
lus01-OST001d_UUID
6144 - 0 - - - - -
|
|
Another data point.
Changing the owner of a file to someone else and returning it will make the file turn up on the right persons quota.
root@isg-disc-mon-05:/lustre/scratch101/ensembl/kb3/scratch/MouseEncode# lfs quota -u kb3 /lustre/scratch101
Disk quotas for user kb3 (uid 11809):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
0 0 1 - 0 0 1 -
root@isg-disc-mon-05:/lustre/scratch101/ensembl/kb3/scratch/MouseEncode# ls -l Compara.12_eutherian_mammals_EPO.tar
rw-rr- 1 kb3 ebiusers 18887997440 2012-03-02 14:56 Compara.12_eutherian_mammals_EPO.tar
root@isg-disc-mon-05:/lustre/scratch101/ensembl/kb3/scratch/MouseEncode# chown kb3 Compara.12_eutherian_mammals_EPO.tar
root@isg-disc-mon-05:/lustre/scratch101/ensembl/kb3/scratch/MouseEncode# lfs quota -u kb3 /lustre/scratch101
Disk quotas for user kb3 (uid 11809):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
0 0 1 - 0 0 1 -
root@isg-disc-mon-05:/lustre/scratch101/ensembl/kb3/scratch/MouseEncode# chown jb23 Compara.12_eutherian_mammals_EPO.tar
root@isg-disc-mon-05:/lustre/scratch101/ensembl/kb3/scratch/MouseEncode# chown kb3 Compara.12_eutherian_mammals_EPO.tar
root@isg-disc-mon-05:/lustre/scratch101/ensembl/kb3/scratch/MouseEncode# lfs quota -u kb3 /lustre/scratch101
Disk quotas for user kb3 (uid 11809):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
18445516* 0 1 - 1* 0 1 -
root@isg-disc-mon-05:/lustre/scratch101/ensembl/kb3/scratch/MouseEncode#
|
|
As a summary.
It appears that new files or files which have their ownership changed are included in a users quota.
We continue to have issues with getting the original quotas in to the system.
|
|
Hi, James
As a summary.
It appears that new files or files which have their ownership changed are included in a users quota.
Could you explain this a little bit?
We continue to have issues with getting the original quotas in to the system.
I don't quite follow this neither...
As Guy said, the quota works for him with new build:
I've redone our 2.4 build, and quota on my test system now works correctly; both the 1.8->2.4 upgraded one, and the freshly formatted 2.4 system. (I needed a round of e2fsck / tune2fs -O ^quota / tunefs.lustre --quota / lctl conf_param to get the stats in sync.)
Did you use the new build?
|
|
>>It appears that new files or files which have their ownership changed are included in a users quota.
>Could you explain this a little bit?
New files are correctly accounted for I think. Changing the owner of a file to root and then changing the ownership back to the original ownership will ensure that file is correctly accounted for. It is worth noting that there is a bit of a delay ( about 20 seconds ) between the writes and the changes becoming apparent with lfs quota
>>We continue to have issues with getting the original quotas in to the system.
>I don't quite follow this neither...
The system to rescan the filesystem and initalise the quota system is still not working.
>As Guy said, the quota works for him with new build:
>I've redone our 2.4 build, and quota on my test system now works correctly; both the 1.8->2.4 upgraded one, and >the freshly formatted 2.4 system. (I needed a round of e2fsck / tune2fs -O ^quota / tunefs.lustre --quota / >lctl conf_param to get the stats in sync.)
>Did you use the new build?
I have used the new build. I am currently trying the following order for e2fsck/tune2fs/tunefs.lustre
#!/bin/sh
LOG="/root/`echo $1.log | sed -e 's#/#_#g'`"
e2fsck -fy $1 2>&1 | tee -a $LOG
tune2fs -O ^quota $1 2>&1 | tee -a $LOG
tunefs.lustre -v --quota $1 2>&1 | tee -a $LOG
I am also mounting the MDT before running the script on the OSS's.
|
|
I have repeated the e2fsck/tune2fs/tunefs.lustre on the MDT, mounted the MDT and then repeated it for the OSS's.
All quotas report as 0.
|
|
Just to clarify James's remarks:
With the new server build, disk accounting and quota enforcement is working, but only for newly written files.
If I create 1G file, the quota system will account for 1GB of space (and will enforce the quota, if appropriate.)
cd /lustre/scratch101/sanger/gmpc/test
dd if=/dev/zero of=bigfiles bs=1M count=1000
lfs quota .
Disk quotas for user gmpc (uid 10795):
Filesystem kbytes quota limit grace files quota limit grace
. 1033460 0 104857600 - 437 0 153600 -
However, the quota system is not counting files that existed on the filesystem before the 1.8 --> 2.4 upgrade was done.
(eg the 4GB of files in this directory not not accounted at all)
ls -alh /lustre/scratch101/sanger/gmpc/allstripe
rw-rr- 1 gmpc team94 21M 2011-07-06 09:34 fart1.dat.gz
r------- 1 gmpc team94 4.0G 2013-09-03 16:10 fart3.dat
r------- 1 gmpc team94 11M 2011-06-29 10:32 fart.dat.gz
|
|
Could we raise the priority of this ticket please ?
Is there any additional information or tests that we can do ?
|
|
James, Guy
Is old inode accounted? if not, could you run following commands for mdt device?
- tune2fs -O ^quota mdt_device (disable quota)
- tune2fs -O quota mdt_device (enable quota)
- setup lustre;
- lfs quota -v user_id; (check if old inodes are accounted)
Please save the console output of first 2 steps and dmesg of last 2 steps. Thanks.
|
|
To clarify what is meant by "setup lustre;" is that the mount of the mdt ?
|
|
/dev/lus01-mdt0/lus01 /export/MDS lustre noauto 0 0
root@lus01-mds2:~# tune2fs -O ^quota /dev/lus01-mdt0/lus01
tune2fs 1.42.7.wc1 (12-Apr-2013)
root@lus01-mds2:~# tune2fs -O quota /dev/lus01-mdt0/lus01
tune2fs 1.42.7.wc1 (12-Apr-2013)
Warning: the quota feature is still under development
See https://ext4.wiki.kernel.org/index.php/Quota for more information
root@lus01-mds2:~#
|
|
"Is old inode accounted?"
No the process does not fix the inodes.
I could make an image of the MDS and MGS discs and upload them ?
Sep 4 18:08:59 lus01-mds2 kernel: LNet: HW CPU cores: 8, npartitions: 2
Sep 4 18:08:59 lus01-mds2 kernel: alg: No test for crc32 (crc32-table)
Sep 4 18:08:59 lus01-mds2 kernel: alg: No test for adler32 (adler32-zlib)
Sep 4 18:09:08 lus01-mds2 kernel: Lustre: Lustre: Build Version: 2.4.0--PRISTINE-2.6.32-lustre-2.4
Sep 4 18:09:08 lus01-mds2 kernel: LNet: Added LNI 172.17.99.9@tcp [8/256/0/180]
Sep 4 18:09:08 lus01-mds2 kernel: LNet: Accept secure, port 988
Sep 4 18:09:09 lus01-mds2 kernel: LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts:
Sep 4 18:09:21 lus01-mds2 kernel: LustreError: 28636:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff880438307800 x1445267555483656/t0(0) o253->MGC172.17.99.10@tcp@0@lo:26/25 lens 4768/4768 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
Sep 4 18:09:21 lus01-mds2 kernel: LustreError: 28636:0:(obd_mount_server.c:1123:server_register_target()) lus01-MDT0000: error registering with the MGS: rc = -5 (not fatal)
Sep 4 18:09:27 lus01-mds2 kernel: LustreError: 28636:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff880435616400 x1445267555483660/t0(0) o101->MGC172.17.99.10@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
Sep 4 18:09:33 lus01-mds2 kernel: LustreError: 28636:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff880435616400 x1445267555483664/t0(0) o101->MGC172.17.99.10@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
Sep 4 18:09:33 lus01-mds2 kernel: Lustre: 28769:0:(obd_config.c:1428:class_config_llog_handler()) For 1.8 interoperability, rename obd type from mds to mdt
Sep 4 18:09:33 lus01-mds2 kernel: Lustre: lus01-MDT0000: used disk, loading
Sep 4 18:09:33 lus01-mds2 kernel: LustreError: 28769:0:(sec_config.c:1115:sptlrpc_target_local_read_conf()) missing llog context
Sep 4 18:09:33 lus01-mds2 kernel: Lustre: 28769:0:(mdt_handler.c:4945:mdt_process_config()) For interoperability, skip this mdt.quota_type. It is obsolete.
Sep 4 18:09:40 lus01-mds2 kernel: LustreError: 28636:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff8803d376a000 x1445267555483904/t0(0) o101->MGC172.17.99.10@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
Sep 4 18:09:46 lus01-mds2 kernel: LustreError: 28636:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff8803d376a000 x1445267555483912/t0(0) o101->MGC172.17.99.10@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
Sep 4 18:09:46 lus01-mds2 kernel: LustreError: 11-0: lus01-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.
Sep 4 18:09:57 lus01-mds2 kernel: LustreError: 28636:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff8804001f2c00 x1445267555483920/t0(0) o253->MGC172.17.99.10@tcp@0@lo:26/25 lens 4768/4768 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
Sep 4 18:10:12 lus01-mds2 kernel: Lustre: lus01-MDT0000: Will be in recovery for at least 5:00, or until 1 client reconnects
Sep 4 18:10:12 lus01-mds2 kernel: Lustre: lus01-MDT0000: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted.
root@isg-disc-mon-05:~# lfs quota -u jb23 /lustre/scratch101
Disk quotas for user jb23 (uid 12296):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
0 0 5368709120 - 0 0 1500000 -
root@isg-disc-mon-05:~# touch /lustre/scratch1
root@isg-disc-mon-05:~# chown jb23 /lustre/scratch101/ensembl/kb3
root@isg-disc-mon-05:~# lfs quota -u jb23 /lustre/scratch101
Disk quotas for user jb23 (uid 12296):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
4 0 5368709120 - 1 0 1500000 -
|
|
Hi James,
It looks like there is a DEBUG_QUOTA define that if set will spit out a ton of debug data when it does the e2fsck. Would it be possible to recompile the e2fsprogs with that and see if it outputs any useful information during the tune2fs?
|
|
James, is there any error messages in dmesg when executing 'lfs quota'?
|
|
I reproduced the problem in my local environment, is trying to figure out the reason.
|
|
There is a defect in e2fsprogs, which caused the quotacheck (triggered on tune2fs -O quota) can only write single user accounting information into quota file, I posted a fix here: http://review.whamcloud.com/7556
|
|
"I reproduced the problem in my local environment, is trying to figure out the reason."
Thank you for that, your work is very much appreciated 
|
|
I have run the process on the MGS and MDS, signs look good.
This is a set of lfs quota after running the process
root@isg-disc-mon-05:~# lfs quota -u jb23 /lustre/scratch101
Disk quotas for user jb23 (uid 12296):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
24 0 5368709120 - 6 0 1500000 -
root@isg-disc-mon-05:~# lfs quota -u kb3 /lustre/scratch101
Disk quotas for user kb3 (uid 11809):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
2076* 0 1 - 2443* 0 1 -
root@isg-disc-mon-05:~# lfs quota -u gmpc /lustre/scratch101
Disk quotas for user gmpc (uid 10795):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
1352464 0 104857600 - 307736* 0 153600 -
Thu Sep 5 09:14:38 BST 2013
/dev/lus01-mdt0/lus01
e2fsck 1.42.7.wc1 (12-Apr-2013)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
[QUOTA WARNING] Usage inconsistent for ID 0:actual (806342656, 1888) != expected (8192, 0)
[QUOTA WARNING] Usage inconsistent for ID 12296:actual (24576, 6) != expected (4096, 1)
Update quota info for quota type 0? yes
[QUOTA WARNING] Usage inconsistent for ID 0:actual (836046848, 15856) != expected (8192, 0)
Update quota info for quota type 1? yes
lus01-MDT0000: ***** FILE SYSTEM WAS MODIFIED *****
lus01-MDT0000: 16039942/1050017792 files (0.1% non-contiguous), 133891006/1050001408 blocks
Thu Sep 5 09:28:00 BST 2013
tune2fs 1.42.7.wc1 (12-Apr-2013)
Thu Sep 5 09:29:10 BST 2013
Warning: the quota feature is still under development
See https://ext4.wiki.kernel.org/index.php/Quota for more information
tune2fs 1.42.7.wc1 (12-Apr-2013)
checking for existing Lustre data: found
Reading CONFIGS/mountdata
Read previous values:
Target: lus01-MDT0000
Index: 0
Lustre FS: lus01
Mount type: ldiskfs
Flags: 0x1
(MDT )
Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro
Parameters: mgsnode=172.17.99.10@tcp mgsnode=172.17.99.9@tcp failover.node=172.17.99.10@tcp mdt.quota_type=ug mdt.group_upcall=/usr/sbin/l_getgroups
Permanent disk data:
Target: lus01-MDT0000
Index: 0
Lustre FS: lus01
Mount type: ldiskfs
Flags: 0x1
(MDT )
Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro
Parameters: mgsnode=172.17.99.10@tcp mgsnode=172.17.99.9@tcp failover.node=172.17.99.10@tcp mdt.quota_type=ug mdt.group_upcall=/usr/sbin/l_getgroups
cmd: tune2fs -O quota /dev/lus01-mdt0/lus01
Thu Sep 5 09:34:04 BST 2013
root@lus01-mds2:/root#
Sep 5 09:35:39 lus01-mds2 kernel: LNet: HW CPU cores: 8, npartitions: 2
Sep 5 09:35:39 lus01-mds2 kernel: alg: No test for crc32 (crc32-table)
Sep 5 09:35:39 lus01-mds2 kernel: alg: No test for adler32 (adler32-zlib)
Sep 5 09:35:48 lus01-mds2 kernel: Lustre: Lustre: Build Version: 2.4.0--PRISTINE-2.6.32-lustre-2.4
Sep 5 09:35:48 lus01-mds2 kernel: LNet: Added LNI 172.17.99.9@tcp [8/256/0/180]
Sep 5 09:35:48 lus01-mds2 kernel: LNet: Accept secure, port 988
Sep 5 09:35:49 lus01-mds2 kernel: LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts:
Sep 5 09:36:01 lus01-mds2 kernel: LustreError: 4973:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff880256f3f400 x1445325856309256/t0(0) o253->MGC172.17.99.10@tcp@0@lo:26/25 lens 4768/4768 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
Sep 5 09:36:01 lus01-mds2 kernel: LustreError: 4973:0:(obd_mount_server.c:1123:server_register_target()) lus01-MDT0000: error registering with the MGS: rc = -5 (not fatal)
Sep 5 09:36:07 lus01-mds2 kernel: LustreError: 4973:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff880438307800 x1445325856309260/t0(0) o101->MGC172.17.99.10@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
Sep 5 09:36:13 lus01-mds2 kernel: LustreError: 4973:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff880438307800 x1445325856309264/t0(0) o101->MGC172.17.99.10@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
Sep 5 09:36:13 lus01-mds2 kernel: Lustre: 5105:0:(obd_config.c:1428:class_config_llog_handler()) For 1.8 interoperability, rename obd type from mds to mdt
Sep 5 09:36:13 lus01-mds2 kernel: Lustre: lus01-MDT0000: used disk, loading
Sep 5 09:36:13 lus01-mds2 kernel: LustreError: 5105:0:(sec_config.c:1115:sptlrpc_target_local_read_conf()) missing llog context
Sep 5 09:36:13 lus01-mds2 kernel: Lustre: 5105:0:(mdt_handler.c:4945:mdt_process_config()) For interoperability, skip this mdt.quota_type. It is obsolete.
Sep 5 09:36:20 lus01-mds2 kernel: LustreError: 4973:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff88043378f000 x1445325856309504/t0(0) o101->MGC172.17.99.10@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
Sep 5 09:36:26 lus01-mds2 kernel: LustreError: 4973:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff88037d951000 x1445325856309512/t0(0) o101->MGC172.17.99.10@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
Sep 5 09:36:26 lus01-mds2 kernel: LustreError: 11-0: lus01-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.
Sep 5 09:36:37 lus01-mds2 kernel: LustreError: 4973:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff88037d951000 x1445325856309520/t0(0) o253->MGC172.17.99.10@tcp@0@lo:26/25 lens 4768/4768 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
Sep 5 09:37:17 lus01-mds2 kernel: Lustre: lus01-MDT0000: Will be in recovery for at least 5:00, or until 1 client reconnects
Sep 5 09:37:17 lus01-mds2 kernel: Lustre: lus01-MDT0000: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted.
Sep 5 09:37:35 lus01-mds2 kernel: Lustre: 5024:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1378370150/real 1378370150] req@ffff880256f3f800 x1445325856309252/t0(0) o250->MGC172.17.99.10@tcp@0@lo:26/25 lens 400/544 e 0 to 1 dl 1378370255 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
|
|
Hi, James
Could install the e2fsprogs from http://build.whamcloud.com/job/e2fsprogs-reviews/173/ (see http://review.whamcloud.com/#/c/7556/), and disable/enable quota for all your mdt and ost devices by:
tune2fs -O ^quota $dev
tune2fs -O quota $dev
Then setup lustre to see if the problem is resloved? Thanks.
|
|
We did this yesterday and ran though the procedure.
I believe that the patch has fixed the issue.
lfs quota -u jb23 /lustre/scratch101
Disk quotas for user jb23 (uid 12296):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
192412 0 5368709120 - 6 0 1500000 -
lfs quota -u kb3 /lustre/scratch101
Disk quotas for user kb3 (uid 11809):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
150478460* 0 1 - 2443* 0 1 -
lfs quota -g ensembl /lustre/scratch101
Disk quotas for group ensembl (gid 707):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/scratch101
4544447964 0 0 - 2506955 0 0 -
|
|
patch landed
|
Generated at Sat Feb 10 01:36:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.