[LU-4110] Odd quota on single OST Created: 16/Oct/13  Updated: 14/Mar/14  Resolved: 14/Mar/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Matteo Piccinini (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Incomplete Votes: 0
Labels: None
Environment:

e2fsprogs-1.42.7.wc1-7.el6.x86_64
e2fsprogs-devel-1.42.7.wc1-7.el6.x86_64
e2fsprogs-static-1.42.7.wc1-7.el6.x86_64
e2fsprogs-libs-1.42.7.wc1-7.el6.x86_64


Severity: 3
Rank (Obsolete): 11066

 Description   

Hello,

we have a problem with our new lustre installation.
We found that, for some unknown reason the quota on a single OST (OST001b) seems corrupt, we were generating files in a loop for testing, and the quota was hit much sooner than anticipated or set, but only on some files.
For some unknown reason, the quota on a single OST (OST001b) is corrupt, we have also tried to completely turn off quotas for user and group and turn it on again, but It did not help as well.

How can we recover OST-0001b`s quota?



 Comments   
Comment by Matteo Piccinini (Inactive) [ 16/Oct/13 ]

[root@node123]# lfs quota -u eric -v /cluster/scratch_xp
Disk quotas for user eric (uid 804):
    Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
/cluster/scratch_xp 145400  2147483648 2147483648       -   36136  209715  209715       -
prism-MDT0000_UUID 1712       -       0       -   36136       -   65536       -
prism-OST0000_UUID  4516       - 16777216       -       -       -       -       -
prism-OST0001_UUID  4516       - 16777216       -       -       -       -       -
prism-OST0002_UUID  4520       - 16777216       -       -       -       -       -
prism-OST0003_UUID 4520       - 16777216       -       -       -       -       -
prism-OST0004_UUID  4508       - 16777216       -       -       -       -       -
prism-OST0005_UUID 4512       - 16777216       -       -       -       -       -
prism-OST0006_UUID  4520       - 16777216       -       -       -       -       -
prism-OST0007_UUID 4512       - 16777216       -       -       -       -       -
prism-OST0008_UUID 4512       - 16777216       -       -       -       -       -
prism-OST0009_UUID 4516       - 16777216       -       -       -       -       -
prism-OST000a_UUID 4520       - 16777216       -       -       -       -       -
prism-OST000b_UUID  4512       - 16777216       -       -       -       -       -
prism-OST000c_UUID  4516       - 16777216       -       -       -       -       -
prism-OST000d_UUID 4520       - 16777216       -       -       -       -       -
prism-OST000e_UUID 4516       - 16777216       -       -       -       -       -
prism-OST000f_UUID 4516       - 16777216       -       -       -       -       -
prism-OST0010_UUID 4516       - 16777216       -       -       -       -       -
prism-OST0011_UUID 4520       - 16777216       -       -       -       -       -
prism-OST0012_UUID 4516       - 16777216       -       -       -       -       -
prism-OST0013_UUID 4516       - 16777216       -       -       -       -       -
prism-OST0014_UUID 4516       - 16777216       -       -       -       -       -
prism-OST0015_UUID 4516       - 16777216       -       -       -       -       -
prism-OST0016_UUID 4516       - 16777216       -       -       -       -       -
prism-OST0017_UUID 4520       - 16777216       -       -       -       -       -
prism-OST0018_UUID 4520       - 16777216       -       -       -       -       -
prism-OST0019_UUID 4516       - 16777216       -       -       -       -       -
prism-OST001a_UUID 4512       - 16777216       -       -       -       -       -
prism-OST001b_UUID 3680*      -    3680       -       -       -       -       -
prism-OST001c_UUID 4516       - 16777216       -       -       -       -       -
prism-OST001d_UUID 4516       - 16777216       -       -       -       -       -
prism-OST001e_UUID 4520       - 16777216       -       -       -       -       -
prism-OST001f_UUID 4520       - 16777216       -       -       -       -       -

Comment by Matteo Piccinini (Inactive) [ 16/Oct/13 ]

root@node123]# perl -e 'printf("p-oss0%d\n",0x1b%8+1)'
p-oss04

I did not observe anything unusual in the /var/log/messages files of p-oss04.

Comment by Matteo Piccinini (Inactive) [ 16/Oct/13 ]

[root@p-oss04 quota_slave]# grep -wA1 804 /proc/fs/lustre/osd-ldiskfs/prism-OST001b/quota_slave/limit_user
limit_user:- id: 804
limit_user- limits:

{ hard: 2147483648, soft: 2147483648, granted: 0, time: 0 }
Comment by Matteo Piccinini (Inactive) [ 16/Oct/13 ]

[root@p-oss04 quota_slave]# cat acct_user
usr_accounting:

  • id: 0
    usage: { inodes: 3557, kbytes: 1028 }
  • id: 804
    usage: { inodes: 1132, kbytes: 3680 }
Comment by Matteo Piccinini (Inactive) [ 16/Oct/13 ]

I have also tried to completely turn off quotas for user and group and turn it on again. It did
not help as well:

[root@p-mds1 ~]# lctl get_param osd-..quota_slave.info
osd-ldiskfs.prism-MDT0000.quota_slave.info=
target name: prism-MDT0000
pool ID: 0
type: md
quota enabled: ug
conn to master: setup
space acct: ug
user uptodate: glb[1],slv[1],reint[0]
group uptodate: glb[1],slv[1],reint[0]
[root@p-mds1 ~]# lctl conf_param prism.quota.ost=none
[root@p-mds1 ~]# lctl conf_param prism.quota.mdt=none
[root@p-mds1 ~]# lctl get_param osd-..quota_slave.info
osd-ldiskfs.prism-MDT0000.quota_slave.info=
target name: prism-MDT0000
pool ID: 0
type: md
quota enabled: none
conn to master: setup
space acct: ug
user uptodate: glb[1],slv[1],reint[0]
group uptodate: glb[1],slv[1],reint[0]
[root@p-mds1 ~]# lctl conf_param prism.quota.mdt=ug
[root@p-mds1 ~]# lctl conf_param prism.quota.ost=ug
[root@p-mds1 ~]# lctl get_param osd-..quota_slave.info
osd-ldiskfs.prism-MDT0000.quota_slave.info=
target name: prism-MDT0000
pool ID: 0
type: md
quota enabled: ug
conn to master: setup
space acct: ug
user uptodate: glb[1],slv[1],reint[0]
group uptodate: glb[1],slv[1],reint[0]

How can I recover OST-0001b`s quota?

Comment by Niu Yawei (Inactive) [ 16/Oct/13 ]

Could you collect log (with D_QUOTA enabled) for OST-0001b and MDT? Thanks.

  • Enable D_QUOTA by "echo +quota > /proc/sys/lnet/debug" on OSS (which OST-0001b located) and MDS;
  • lctl debug_daemon start tmpfile 500; (on OSS & MDS)
  • lctl mark "=== start test"; (on OSS & MDS)
  • Write to the file which is striped on OST-0001b as user eric; (on client)
  • lctl debug_daemon stop; (on OSS & MDS)
  • lctl debug_file tmpfile logfile; (on OSS & MDS)

Hope we can find something useful from the logfiles.

Comment by Niu Yawei (Inactive) [ 18/Oct/13 ]

logs: ftp://ftp.whamcloud.com/uploads/LU-4110

I didn't find anything abnormal in the logs, and the test passed successfully. Maybe there was some connection problem between OST-0001b with MDT?

Please keep reporting if you find anything wrong. Thank you.

Comment by John Fuchs-Chesney (Inactive) [ 20/Feb/14 ]

Hello Matteo,
We have not heard from you in a while on this issue.
Can I mark it as resolved?
Thanks,
~ jfc.

Comment by John Fuchs-Chesney (Inactive) [ 14/Mar/14 ]

Seems like this may have been a transient problem.
We can reopen this again if the problem recurs.
~ jfc.

Generated at Sat Feb 10 01:39:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.