[LU-3839] Incorrect file system usage on Lustre Quota Created: 27/Aug/13  Updated: 06/Feb/14  Resolved: 06/Feb/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.8
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Shuichi Ihara (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

applied patch following patches against 1.8.8.

bd649b1 LU-1754 kernel: Kernel update [RHEL6.3 2.6.32-279.5.1.el6]
1513306 LU-919 obdclass: remove hard coded 0x5a5a5a
7a9fc09 LU-1720 kernel: Quota doesn't work over 4TB on single OST
df3a540 LU-1782 quota: ignore sb_has_quota_active() in OFED's header
8c3084c LU-1496 ptlrpc: prolong rw locks even IO RPCs are finished
747d905 LU-1115 kernel: software raid6 related BUG
01811d4 LU-359 llite: no close error if application has known failure
944d1c1 LU-1488 mdc: fix fid_res_name_eq() issue.
8254103 LU-1511 kernel: kernel update [RHEL5.8 2.6.18-308.11.1.el5]
048c04b LU-1563 quota: Put lqs properly in quota_pending_commit()
e53e756 LU-1535 ldlm: backport fix for LU-1128
3a4f224 LU-1459 llite: Don't LBUG when import has LUSTRE_IMP_NEW state
0152977 LU-1459 llite: Don't use unitialized variable
65b7e5a LU-1448 llite: Prevent NULL pointer dereference on disabled OSC
bd671c0 LU-1438 quota: quota active checking is missed on slave
423bfd1 LU-1438 quota: fix race in quota_chk_acq_common()
e92a9dd LU-814 tests: remove leading spaces from $WRITE_DISJOINT
bc88c4c LU-121 test: Change framework to only use the short hostname.
294b409 LU-458 debug: print client profile name correctly
7ef90f4 LU-1424 kernel: Kernel update [RHEL6.2 2.6.32-220.17.1.el6]
48c2f66 LU-458 debug: use profilenm before running class_del_profile()
fe92ca6 LU-425 tests: fix the issue of using "grep -w" 
dd8037d LU-1340 release: get ready for 1.8.8-wc1 RC1

Attachments: File 2013-08-23.tar.gz     File lctl_dk.out.2.gz     File lctl_dk_wQUOTA_TRACE.out.2.gz     File messages_20130924.tar.gz    
Severity: 3
Rank (Obsolete): 9936

 Description   

Lustre quota does not show correct file system usage for an user.
When the customer counted up file size, the usage of that user is about 62GB, but "lfs quota" is showing 7TB.

$ COUNT=0; for i in `cat file_size_20130821.txt | awk '{ print $5 }'`

    do
    COUNT=`expr ${COUNT} + ${i}`
    done; echo "SUM ${COUNT}"

SUM 65272511

Lustre quota shows as follows.

[root@se1 ~]# date
Fri Aug 23 10:00:35 JST 2013

[root@se1 ~]# lfs quota -u kawashin /nshare2
Disk quotas for user kawashin (uid 14520):
Filesystem kbytes quota limit grace files quota limit grace
/nshare2 7076986516 0 0 - 14157 0 0 -

quotacheck did not work.

[root@se1 ~]# lfs quotacheck -ug /nshare2 ; date
Fri Aug 23 10:01:50 JST 2013
[root@se1 ~]# lfs quota -u kawashin /nshare2
Disk quotas for user kawashin (uid 14520):
Filesystem kbytes quota limit grace files quota limit grace
/nshare2 7076986516 0 0 - 14157 0 0 -



 Comments   
Comment by Shuichi Ihara (Inactive) [ 27/Aug/13 ]

All log files of OSS/MDS

Comment by Shuichi Ihara (Inactive) [ 27/Aug/13 ]

There are a lot fo following error messages on MDS.

Aug 23 10:01:32 nmd031i kernel: LustreError: 10005:0:(fsfilt-ldiskfs.c:2243:fsfilt_ldiskfs_dquot()) operate dquot before it's enabled!
Aug 23 10:01:32 nmd031i kernel: LustreError: 10005:0:(quota_master.c:219:lustre_dqget()) can't read dquot from admin quotafile! (rc:-5)
Aug 23 10:01:32 nmd031i kernel: LustreError: 10005:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:32 nmd031i kernel: LustreError: 10009:0:(fsfilt-ldiskfs.c:2243:fsfilt_ldiskfs_dquot()) operate dquot before it's enabled!
Aug 23 10:01:32 nmd031i kernel: LustreError: 10009:0:(quota_master.c:219:lustre_dqget()) can't read dquot from admin quotafile! (rc:-5)
Aug 23 10:01:32 nmd031i kernel: LustreError: 10009:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:32 nmd031i kernel: LustreError: 10029:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:32 nmd031i kernel: LustreError: 9949:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:32 nmd031i kernel: LustreError: 10003:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:32 nmd031i kernel: LustreError: 9967:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:32 nmd031i kernel: LustreError: 10042:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:32 nmd031i kernel: LustreError: 9936:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:32 nmd031i kernel: LustreError: 6081:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:32 nmd031i kernel: LustreError: 9960:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:32 nmd031i kernel: LustreError: 9924:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:32 nmd031i kernel: LustreError: 9994:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:36 nmd031i kernel: LustreError: 9981:0:(fsfilt-ldiskfs.c:2243:fsfilt_ldiskfs_dquot()) operate dquot before it's enabled!
Aug 23 10:01:36 nmd031i kernel: LustreError: 9981:0:(fsfilt-ldiskfs.c:2243:fsfilt_ldiskfs_dquot()) Skipped 10 previous similar messages
Aug 23 10:01:36 nmd031i kernel: LustreError: 9981:0:(quota_master.c:219:lustre_dqget()) can't read dquot from admin quotafile! (rc:-5)
Aug 23 10:01:36 nmd031i kernel: LustreError: 9981:0:(quota_master.c:219:lustre_dqget()) Skipped 10 previous similar messages
Aug 23 10:01:36 nmd031i kernel: LustreError: 9981:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:36 nmd031i kernel: LustreError: 10030:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:36 nmd031i kernel: LustreError: 9929:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:36 nmd031i kernel: LustreError: 9939:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:36 nmd031i kernel: LustreError: 10014:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:36 nmd031i kernel: LustreError: 10000:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:36 nmd031i kernel: LustreError: 9985:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
Aug 23 10:01:36 nmd031i kernel: LustreError: 9923:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
....
Comment by Niu Yawei (Inactive) [ 27/Aug/13 ]

I have few questions:

1. What's the source of these error messages? qutoacheck? It looks like the quota wasn't enabled.
2. Are the numbers in file_size_20130821.txt in byte?

If there is really such a huge gap between quota usage & file size, I suspect there could be some orphan objects belong to the user, could you try "lfs quota -v" to see the detailed quota usage?

Comment by Shuichi Ihara (Inactive) [ 27/Aug/13 ]

1. What's the source of these error messages? qutoacheck? It looks like the quota wasn't enabled.

Always, messages are showing up on the normal operaiton even right now. The quota has been enabling.. we didn't disable it.
we collected debug information when it ran 'lfs quotacheck'. I will post it soon.

2. Are the numbers in file_size_20130821.txt in byte?

Yes, this is an summary of 'ls' output.

If there is really such a huge gap between quota usage & file size, I suspect there could be some orphan objects belong to the user, could you try "lfs quota -v" to see the detailed quota usage?

sure. will get them and post here.

Comment by Shuichi Ihara (Inactive) [ 27/Aug/13 ]

this is debug output during it ran 'lfs quotacheck' and 'lfs quota -u xxx' command.

Comment by Mitsuhiro Nishizawa [ 28/Aug/13 ]

Hi, here is output of "lfs quota -u kawashin -v /nshare2".

[root@wk2 ~]# lfs quota -u kawashin -v /nshare2
Disk quotas for user kawashin (uid 14520):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
       /nshare2 10904878092       0       0       -   14157       0       0       -
nshare2-MDT0000_UUID
                   2736       -       0       -   14157       -       0       -
nshare2-OST0000_UUID
                889661136       -       0       -       -       -       -       -
nshare2-OST0001_UUID
                1598348       -       0       -       -       -       -       -
nshare2-OST0002_UUID
                 939816       -       0       -       -       -       -       -
nshare2-OST0003_UUID
                2554020       -       0       -       -       -       -       -
nshare2-OST0004_UUID
                3709408       -       0       -       -       -       -       -
nshare2-OST0005_UUID
                907519088       -       0       -       -       -       -       -
nshare2-OST0006_UUID
                908373712       -       0       -       -       -       -       -
nshare2-OST0007_UUID
                905286696       -       0       -       -       -       -       -
nshare2-OST0008_UUID
                1015128       -       0       -       -       -       -       -
nshare2-OST0009_UUID
                 578096       -       0       -       -       -       -       -
nshare2-OST000a_UUID
                1666080       -       0       -       -       -       -       -
nshare2-OST000b_UUID
                1009400       -       0       -       -       -       -       -
nshare2-OST000c_UUID
                917450860       -       0       -       -       -       -       -
nshare2-OST000d_UUID
                902701472       -       0       -       -       -       -       -
nshare2-OST000e_UUID
                2146448       -       0       -       -       -       -       -
nshare2-OST000f_UUID
                4541320       -       0       -       -       -       -       -
nshare2-OST0010_UUID
                 594752       -       0       -       -       -       -       -
nshare2-OST0011_UUID
                1957092       -       0       -       -       -       -       -
nshare2-OST0012_UUID
                911761044       -       0       -       -       -       -       -
nshare2-OST0013_UUID
                1382416       -       0       -       -       -       -       -
nshare2-OST0014_UUID
                898266076       -       0       -       -       -       -       -
nshare2-OST0015_UUID
                1392196       -       0       -       -       -       -       -
nshare2-OST0016_UUID
                1201028       -       0       -       -       -       -       -
nshare2-OST0017_UUID
                 690192       -       0       -       -       -       -       -
nshare2-OST0018_UUID
                1400884       -       0       -       -       -       -       -
nshare2-OST0019_UUID
                 909708       -       0       -       -       -       -       -
nshare2-OST001a_UUID
                897309256       -       0       -       -       -       -       -
nshare2-OST001b_UUID
                890611760       -       0       -       -       -       -       -
nshare2-OST001c_UUID
                1325556       -       0       -       -       -       -       -
nshare2-OST001d_UUID
                 808944       -       0       -       -       -       -       -
nshare2-OST001e_UUID
                 667016       -       0       -       -       -       -       -
nshare2-OST001f_UUID
                 558356       -       0       -       -       -       -       -
nshare2-OST0020_UUID
                1013192       -       0       -       -       -       -       -
nshare2-OST0021_UUID
                1164864       -       0       -       -       -       -       -
nshare2-OST0022_UUID
                4285100       -       0       -       -       -       -       -
nshare2-OST0023_UUID
                 945804       -       0       -       -       -       -       -
nshare2-OST0024_UUID
                1489132       -       0       -       -       -       -       -
nshare2-OST0025_UUID
                 991472       -       0       -       -       -       -       -
nshare2-OST0026_UUID
                 946508       -       0       -       -       -       -       -
nshare2-OST0027_UUID
                917889588       -       0       -       -       -       -       -
nshare2-OST0028_UUID
                1156456       -       0       -       -       -       -       -
nshare2-OST0029_UUID
                913405936       -       0       -       -       -       -       -
[root@wk2 ~]# 
Comment by Niu Yawei (Inactive) [ 28/Aug/13 ]
00002000:00000001:11:1377565151.521295:0:9987:0:(fsfilt-ldiskfs.c:2240:fsfilt_ldiskfs_dquot()) Process entered
00002000:00020000:11:1377565151.521295:0:9987:0:(fsfilt-ldiskfs.c:2243:fsfilt_ldiskfs_dquot()) operate dquot before it's enabled!
00002000:00000001:11:1377565151.521296:0:9987:0:(fsfilt-ldiskfs.c:2244:fsfilt_ldiskfs_dquot()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
00040000:00020000:11:1377565151.521297:0:9987:0:(quota_master.c:219:lustre_dqget()) can't read dquot from admin quotafile! (rc:-5)
00040000:00000001:11:1377565151.521297:0:9987:0:(quota_master.c:180:lustre_dqput()) Process entered
00040000:00000001:11:1377565151.521298:0:9987:0:(quota_master.c:189:lustre_dqput()) Process leaving
00040000:00000001:11:1377565151.521299:0:9987:0:(quota_master.c:221:lustre_dqget()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
00040000:00000001:11:1377565151.521300:0:9987:0:(quota_master.c:356:dqacq_handler()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
00010000:00020000:11:1377565151.521301:0:9987:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)

The log shows that quota wasn't enabled on MDS, so I suppose that quotacheck didn't actually doing anything. I didn't see anything related to quotacheck in the log neither, could you re-run quotacheck to see if it can enable quota properly? If it can enable quota properly, please check if the quota usage is fixed after quotacheck; If it can't, please enable D_QUOTA & D_TRACE and collect the log while running 'lfs quotacheck'(hope we can find out why quotacheck failed from the log). Thanks a lot.

Comment by Mitsuhiro Nishizawa [ 03/Sep/13 ]

debug log with D_QUOTA & D_TRACE.

Comment by Mitsuhiro Nishizawa [ 03/Sep/13 ]

Hi, we have received a debug log with D_QUOTA & D_TRACE enabled and captured after quotacheck. Thanks,

Comment by Niu Yawei (Inactive) [ 03/Sep/13 ]

Thank you, Mitsuhiro. I didn't see anything abnormal in the log, did the quotacheck fixed the problem (incorrect usage), and did the quotacheck enable quota successfully? If it didn't fix the problem, could you try following commands to capture the log (I didn't see anything related to quotacheck in this log)?

  • Enable D_QUOTA & D_TRACE on both MDS and all OSS by "lctl set_param debug=quota; lctl set_param debug=trace";
  • Clear debug log buffer on both MDS & OSS by "lctl clear";
  • Start debug_daemon on both MDS & OSS by "lctl debug_daemon start $tmp_filename 500";
  • Run quotacheck on client "lfs quotacheck -ug";
  • After quotacheck done, stop debug daemon on MDS & OSS by "lctl debug_daemon stop";
  • Convert debug file into text file by "lctl debug_file $tmp_filename $log_filename";
  • Collect all the log files ($log_filename) and attach them here;
Comment by Mitsuhiro Nishizawa [ 04/Sep/13 ]

Hi Niu, we have captured the log again, but the log this time may not contain full quotachek output as well since it only contains logs in 19 seconds.
Please kindly check the log and let us know if anything to be noticed.

We put the log in the following location as the size is 43MB.
https://woscloud.corp.ddn.com/v2/files/ZjAxZTk1YjlkZWI0ZDE0ZGEzZTExNDI1MWM5NGRiZTZi/content/inline/lctl_debug_20130904.out.gz

Comment by Niu Yawei (Inactive) [ 11/Sep/13 ]

Hi, Mitsuhiro

Seems there isn't any error message in the log, did the quotacheck turned quota on? (when you run 'lfs quota -u xxx', if there any error message on MDS/OSS?) Did the incorrect usage problem fixed?

Comment by Mitsuhiro Nishizawa [ 11/Sep/13 ]

Hi Niu,
Quota should be on, but we do not see any error either on MDS/OSS when issuing 'lfs quota' (occasionally, "still haven't managed to acquire quota space..." is output).
Incorrect usage has not been fixed. Here is current outout.

[root@wk2 ~]# lfs quota -u kawashin /nshare2
Disk quotas for user kawashin (uid 14520):
Filesystem kbytes quota limit grace files quota limit grace
/nshare2 21433982796 0 0 - 14157 0 0 -
[root@wk2 ~]#

Were you able to find the log when quotacheck was issued? Should we capture it again using bigger size of file?

Comment by Shuichi Ihara (Inactive) [ 12/Sep/13 ]

Hi Niu,

As Mitsuhiro mentioned, the quota is enabled and this problem is not fixed yet. Still incorrect quota size are visibile on "lfs quota" command.
As one of posibility, this might be similar issue with LU-860? Please advise.

Comment by Niu Yawei (Inactive) [ 12/Sep/13 ]

As Mitsuhiro mentioned, the quota is enabled and this problem is not fixed yet. Still incorrect quota size are visibile on "lfs quota" command.
As one of posibility, this might be similar issue with LU-860? Please advise.

Yes, it's possible that there are orphan objects leaked if quotacheck doesn't help. Could you follow the instructions in LU-860 to remove those orphans? Thanks.

Comment by Shuichi Ihara (Inactive) [ 12/Sep/13 ]

OK, but it seems we need to stop the lustre.
we want to check whether if the PENDING direcotry exists or not without stop the Lustre. Can we do that? Any ideas?
Then, if we confirm this is same issue of LU-860, we will go stop the Lustre to fix this.

Comment by Niu Yawei (Inactive) [ 12/Sep/13 ]

PENDING directory is always created by lustre once the OST/MDT mount, I'm not sure if there is a good way to check PENDING directory online. (lfsck can check orphan object, but it will be very slow).

I think you can choose one OST which has most significant quota usage (seen from lfs quota -v) to stop, and see if there are orphans objects belong to the user in PENDING.

Comment by Mitsuhiro Nishizawa [ 13/Sep/13 ]

Hi Niu,
So the only way to check if there are orphaned object is to mount MDT/OST as ldiskfs and see if there is PENDING* directory.
When we found the directory, what we should do?
LU-860 states '"back to namespace"+unlink' worked, but what "back to namespace" means specifically? unlink is just remove the files by 'rm' or 'unlink' command?

Comment by Niu Yawei (Inactive) [ 16/Sep/13 ]

PENDING directory is created on MDS to save the open-unlinked files, if you never renamed the PENDING directory (mentioned in LU-860), there should be only one PENDING directory on MDS. (sorry, my previous comment is not quite right and misleanding)

LU-860 states '"back to namespace"+unlink' worked, but what "back to namespace" means specifically? unlink is just remove the files by 'rm' or 'unlink' command?

You need to mount MDT as ldiskfs, and check if there are lots of files under PENDING directory, then check if the file owner is the uid which have incorrect quota usage, if there are such files, you need to do "back to namespace" + "unlink":

  • Move the files from PENDING to ROOT directory (back to namespace);
  • Mount as Lustre, and unlink those files from Lustre client (unlink);
Comment by Mitsuhiro Nishizawa [ 18/Sep/13 ]

Many thanks, Niu! Can you clarify on the following? We want to be doubly sure as there is limited chance / time to do a maintenance.

> Move the files from PENDING to ROOT directory (back to namespace);
to ROOT directory of ldiskfs, correct?

> Mount as Lustre, and unlink those files from Lustre client (unlink);
Can we know the path and file name from what we can see under PENDING directory?
(we are not sure how files exist under PENDING directory though...file/FID?)

Comment by Niu Yawei (Inactive) [ 18/Sep/13 ]

Yes, move to the ROOT directory of ldiskfs. You can see that all filename under PENDING directory are composed by i_no:i_generation, once you move those files into ROOT, you can see them on lustre client.

Comment by Mitsuhiro Nishizawa [ 19/Sep/13 ]

Niu, how can a file look like after we put it back to the namespace?
If I am understanding correctly, that file should be somewhere in lustre file system and we need to search it from i_no/i_generation or file name.
Can we distinguish it easily from other files which should not (never) be unlinked? i_no and file name is the only hint?
e.g. 'ls -l' show '?' like a case where inode entry exist in MDT, but no object on OST... Many thanks,

Comment by Niu Yawei (Inactive) [ 19/Sep/13 ]

You could create a directory "LU-3839" in lustre root (then you'll see this directory in ROOT once you mount it as ldiskfs), and move all files in PENDING into LU-3839 directory. When you mount lustre back, you can find all files under the LU-3839 directory easily.

Comment by Mitsuhiro Nishizawa [ 19/Sep/13 ]

Understood well. Many thanks!

Comment by Mitsuhiro Nishizawa [ 20/Sep/13 ]

1. What can be a cause (or trigger) of LU-860 issue? It looks like no patch is available for this issue. Will this issue be fixed in lustre code?
2. The customer need to stop their service to confirm if they are really affected by the issue of LU-860 and, if so, to fix the issue. They are concerned about a chance that there is no file under PENDING directory and our investigation does not proceed even if they stop their service. Their expectation is that we should at lease identify the cause of incorrect quota report and confirm required action to fix it. Is there anything we can do to fix the issue when files under PENDING are not culprit?

Comment by Niu Yawei (Inactive) [ 22/Sep/13 ]

1. What can be a cause (or trigger) of LU-860 issue? It looks like no patch is available for this issue. Will this issue be fixed in lustre code?

When client failure happens while unlinking a file, orphan could probably be generated on MDS or OST, these orphan should be cleared automatically when restart MDS/OST, but such orphan cleanup could fail as well. For LU-860, I think it caused by user renamed PENDING directory manually (to workaround other problems, see LU-601).

2. The customer need to stop their service to confirm if they are really affected by the issue of LU-860 and, if so, to fix the issue. They are concerned about a chance that there is no file under PENDING directory and our investigation does not proceed even if they stop their service. Their expectation is that we should at lease identify the cause of incorrect quota report and confirm required action to fix it. Is there anything we can do to fix the issue when files under PENDING are not culprit?

Is the inode usage for the user correct? If the inode usage isn't correct as well, I highly suspect that there are orphans in PENDING dir. If the inode usage is correct, probably there are orphan on OSTs. (Given that quotacheck has been done successfully).

Comment by Mitsuhiro Nishizawa [ 24/Sep/13 ]

inode usage is a bit incorrect. From the list of user files provided when we created this ticket, number of files are 14145 and quota was showing 14157.
$ wc file_size_20130821.txt
14145 127380 1875155 file_size_20130821.txt

On the other hands, disk usage had a big different...
BTW, I found at this late date that 'lfs quota' in the description showed 7TB usage, but 'lfs quota -v' output above showed 10TB usage while inode count had not changed.
...I checked the current 'lfs quota' output and noticed the disk usage shows correct value..
It is not clear when this occurred. Do you have any idea why this occurred? We will attach recent messages files.

[root@wk2 ~]# lfs quota -u kawashin /nshare2
Disk quotas for user kawashin (uid 14520):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
       /nshare2 65116548       0       0       -   14145       0       0       -

[root@wk2 ~]# lfs quota -v -u kawashin /nshare2
Disk quotas for user kawashin (uid 14520):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
       /nshare2 65116548       0       0       -   14145       0       0       -
nshare2-MDT0000_UUID
                   2736       -       0       -   14145       -       0       -
nshare2-OST0000_UUID
                 933072       -       0       -       -       -       -       -
nshare2-OST0001_UUID
                1598348       -       0       -       -       -       -       -
nshare2-OST0002_UUID
                 939816       -       0       -       -       -       -       -
nshare2-OST0003_UUID
                2554020       -       0       -       -       -       -       -
nshare2-OST0004_UUID
                3709408       -       0       -       -       -       -       -
nshare2-OST0005_UUID
                1626504       -       0       -       -       -       -       -
nshare2-OST0006_UUID
                1552952       -       0       -       -       -       -       -
nshare2-OST0007_UUID
                 938512       -       0       -       -       -       -       -
nshare2-OST0008_UUID
                1015128       -       0       -       -       -       -       -
nshare2-OST0009_UUID
                 578096       -       0       -       -       -       -       -
nshare2-OST000a_UUID
                1666080       -       0       -       -       -       -       -
nshare2-OST000b_UUID
                1009400       -       0       -       -       -       -       -
nshare2-OST000c_UUID
                1113732       -       0       -       -       -       -       -
nshare2-OST000d_UUID
                1607584       -       0       -       -       -       -       -
nshare2-OST000e_UUID
                2146448       -       0       -       -       -       -       -
nshare2-OST000f_UUID
                4541320       -       0       -       -       -       -       -
nshare2-OST0010_UUID
                 594752       -       0       -       -       -       -       -
nshare2-OST0011_UUID
                1957092       -       0       -       -       -       -       -
nshare2-OST0012_UUID
                1275508       -       0       -       -       -       -       -
nshare2-OST0013_UUID
                1382416       -       0       -       -       -       -       -
nshare2-OST0014_UUID
                3053048       -       0       -       -       -       -       -
nshare2-OST0015_UUID
                1392196       -       0       -       -       -       -       -
nshare2-OST0016_UUID
                1201028       -       0       -       -       -       -       -
nshare2-OST0017_UUID
                 690192       -       0       -       -       -       -       -
nshare2-OST0018_UUID
                1400884       -       0       -       -       -       -       -
nshare2-OST0019_UUID
                 909708       -       0       -       -       -       -       -
nshare2-OST001a_UUID
                2505724       -       0       -       -       -       -       -
nshare2-OST001b_UUID
                1033740       -       0       -       -       -       -       -
nshare2-OST001c_UUID
                1325556       -       0       -       -       -       -       -
nshare2-OST001d_UUID
                 808944       -       0       -       -       -       -       -
nshare2-OST001e_UUID
                 667016       -       0       -       -       -       -       -
nshare2-OST001f_UUID
                 558356       -       0       -       -       -       -       -
nshare2-OST0020_UUID
                1013192       -       0       -       -       -       -       -
nshare2-OST0021_UUID
                1164864       -       0       -       -       -       -       -
nshare2-OST0022_UUID
                4285100       -       0       -       -       -       -       -
nshare2-OST0023_UUID
                 945804       -       0       -       -       -       -       -
nshare2-OST0024_UUID
                1489132       -       0       -       -       -       -       -
nshare2-OST0025_UUID
                 991472       -       0       -       -       -       -       -
nshare2-OST0026_UUID
                 946508       -       0       -       -       -       -       -
nshare2-OST0027_UUID
                1310688       -       0       -       -       -       -       -
nshare2-OST0028_UUID
                1156456       -       0       -       -       -       -       -
nshare2-OST0029_UUID
                3524016       -       0       -       -       -       -       -
Comment by Mitsuhiro Nishizawa [ 24/Sep/13 ]

messages files from MDS/OSS on 20130924

Comment by Johann Lombardi (Inactive) [ 24/Sep/13 ]

I agree with Niu, you likely have open-unlinked files in the PENDING directory that haven't been cleaned up yet.
As for the "inconsistence" between lfs quota and lfs quota -v, i am not sure how you ended up with 10TB:

$ bc -l
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'. 
2736+933072+1598348+939816+2554020+3709408+1626504+1552952+938512+1015128+578096+1666080+1009400+1113732+1607584+2146448+4541320+594752+1957092+1275508+1382416+3053048+1392196+1201028+690192+1400884+909708+2505724+1033740+1325556+808944+667016+558356+1013192+1164864+4285100+945804+1489132+991472+946508+1310688+1156456+3524016
65116548

For me, the two results perfectly match.

Comment by Mitsuhiro Nishizawa [ 25/Sep/13 ]

Why PENDING directory had not been cleaned up so long time? The customer was seeing incorrect quota usage (as much as 7TB) for at least two weeks. What triggered cleaned up?
They are using quota usage to know how much space each users are using. If this occur in normal operation, they cannot trust quota output at all.

Comment by Johann Lombardi (Inactive) [ 25/Sep/13 ]

As mentioned earlier by Niu, if the PENDING directory was renamed manually, then files present in this directory have never been cleaned up.
Could you please check with debugfs that:
1. the PENDING directory has indeed been renamed & recreated
2. there are indeed files under this directory.

If so and if the fix for LU-601 is applied, i would advise to shut down the MDT, move the open-unlinked files of the renamed PENDING dir to the real PENDING directory and restart the MDT.

Comment by Mitsuhiro Nishizawa [ 26/Sep/13 ]

1. the PENDING directory has indeed been renamed & recreated
The customer would never do this, we think. How can we check this with debugfs?

2. there are indeed files under this directory.
Can this be checked with debugfs while MDT is servicing as lustre? Niu said we cannot.

What the customer is concerned mostly is if the current quota output is really correct.
Can we say quota output is correct when there is no file under PENDING directory (currently, quota usage for the user looks correct apparently)?
We see many log like below. What does this mean? Is this not related to incorrect usage problem? (if we should create a new ticket, please let us know)
Sep 23 22:18:20 nos071i kernel: Lustre: 13850:0:(quota_interface.c:491:quota_chk_acq_common()) still haven't managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)

Comment by Johann Lombardi (Inactive) [ 26/Sep/13 ]

The customer would never do this, we think.

Niu thought that such an action might have been done to address LU-860. Is it plausible?

How can we check this with debugfs?

1. run "debugfs $device_path"
2. "ls" to check how many PENDING directories we have
3. "ls" against all PENDING* directories to check if we have any files in there

Can this be checked with debugfs while MDT is servicing as lustre? Niu said we cannot.

Yes, you can run debugfs in read-only mode.

What the customer is concerned mostly is if the current quota output is really correct.
Can we say quota output is correct when there is no file under PENDING directory (currently, quota usage for the user looks correct apparently)?

Since you successfully ran quotacheck, accounting is very likely correct.

We see many log like below. What does this mean? Is this not related to incorrect usage problem? (if we should create a new ticket, please let us know)
Sep 23 22:18:20 nos071i kernel: Lustre: 13850:0:(quota_interface.c:491:quota_chk_acq_common()) still haven't managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)

I don't think it is related to a potential incorrect accounting. It means that it takes several iterations before acquiring space from the master, probably because of contention due to many threads trying to get quota space for the same ID. The "cycle" value seems to be always 10, so it means that the thread finally got space and those messages should be harmless.

Comment by Mitsuhiro Nishizawa [ 02/Oct/13 ]

Hi, we checked the PENDING directory and many files under it. The timestamps of the files are quit old, the oldest is Feb and we can see many files modified in Aug.
Currently, we do not see files owned by "kawashin" user (ID: 14520).

debugfs:  ls -l
 975747585   40777 (2)      0      0   77824  2-Oct-2013 10:13 .
      2   40755 (2)      0      0    4096  7-Aug-2012 23:30 ..
 957411756  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1ac:4430f8c7
 986228362  100644 (1)  14457  10693       0 30-Sep-2013 09:36 3ac8a68a:4624ac43
 957412285  100640 (1)  14148   1000    1592 18-Sep-2013 14:20 3910f3bd:45f57947
 957411948  100640 (1)  14148   1000       0  5-Aug-2013 16:09 3910f26c:456d16c4
 957411955  100640 (1)  14148   1000       0  6-Aug-2013 15:00 3910f273:456ed599
 966319255  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dc97:455a5a09
 966319282  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dcb2:455a5a5b
 957411960  100640 (1)  14148   1000    1432  6-Aug-2013 15:00 3910f278:456ed5a9
 957412018  100640 (1)  14148   1000   34128  6-Aug-2013 15:00 3910f2b2:456ed5ca
 957412276  100640 (1)  14148   1000    1808 18-Sep-2013 14:20 3910f3b4:45f57935
 957412293  100640 (1)  14148   1000    5240 18-Sep-2013 14:20 3910f3c5:45f57957
 957411772  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1bc:4430f887
 957410622  100640 (1)  14148   1000       0  9-May-2013 09:58 3910ed3e:4430f89b
 957410623  100640 (1)  14148   1000       0  9-May-2013 09:58 3910ed3f:4430f8a1
 957411779  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1c3:4430f8a9
 957411789  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1cd:4430f8ab
 957411790  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1ce:4430f8ad
 957411791  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1cf:4430f8af
 957411876  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f224:4430f8b3
 957411761  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1b1:4430f8c3
 957411751  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1a7:4430f8c5
 957412004  100640 (1)  14148   1000    1808  6-Aug-2013 15:00 3910f2a4:456ed58d
 957412284  100640 (1)  14148   1000    1544 18-Sep-2013 14:20 3910f3bc:45f57945
 957411962  100640 (1)  14148   1000    2784  6-Aug-2013 15:00 3910f27a:456ed5af
 957412019  100640 (1)  14148   1000   15504  6-Aug-2013 15:00 3910f2b3:456ed5cc
 957412312  100640 (1)  14148   1000    1992 18-Sep-2013 14:20 3910f3d8:45f57923
 966319285  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dcb5:455a5a33
 957410600  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910ed28:455a5ae6
 957412005  100640 (1)  14148   1000       0  6-Aug-2013 15:00 3910f2a5:456ed58f
 957411959  100640 (1)  14148   1000       0  6-Aug-2013 15:00 3910f277:456ed5a7
 957412299  100640 (1)  14148   1000   15504 18-Sep-2013 14:20 3910f3cb:45f57963
 957411744  100640 (1)  14148   1000       0 24-Sep-2013 13:32 3910f1a0:46136e2d
 957411747  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1a3:4430f8b7
 957411759  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1af:4430f8bf
 966319287  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dcb7:455a5a25
 957410603  100640 (1)  14148   1000    4328  6-Aug-2013 15:00 3910ed2b:456ed589
 957412272  100640 (1)  14148   1000   27280 18-Sep-2013 14:20 3910f3b0:45f5792b
 957410606  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910ed2e:455a5afa
 965848243  100640 (1)  14148   1000     280  5-Aug-2013 16:10 3991acb3:456d1746
 957412002  100640 (1)  14148   1000       0  6-Aug-2013 15:00 3910f2a2:456ed57b
 966319275  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dcab:455a5a4d
 957411961  100640 (1)  14148   1000    3560  6-Aug-2013 15:00 3910f279:456ed5ac
 957412015  100640 (1)  14148   1000    9136  6-Aug-2013 15:00 3910f2af:456ed5bd
 957412269  100640 (1)  14148   1000   16496 18-Sep-2013 14:20 3910f3ad:45f57925
 957412292  100640 (1)  14148   1000    7000 18-Sep-2013 14:20 3910f3c4:45f57955
 957412295  100640 (1)  14148   1000    9136 18-Sep-2013 14:20 3910f3c7:45f5795b
 957410611  100640 (1)  14148   1000       0 24-Sep-2013 13:32 3910ed33:46136e1d
 957411878  100640 (1)  14148   1000       0 24-Sep-2013 13:32 3910f226:46136e29
 957412000  100640 (1)  14148   1000   16496  6-Aug-2013 15:00 3910f2a0:456ed569
 974177487  100600 (1)  14126  10756       0 29-Jul-2013 17:04 3a10c4cf:4552f7b9
 957412266  100640 (1)  14148   1000    3080 18-Sep-2013 14:20 3910f3aa:45f57931
 957412275  100640 (1)  14148   1000    4328 18-Sep-2013 14:20 3910f3b3:45f57933
 966319269  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dca5:455a5a45
 966319281  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dcb1:455a5a59
 966319283  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dcb3:455a5a5d
 957411946  100640 (1)  14148   1000       0  5-Aug-2013 16:09 3910f26a:456d16c0
 957411947  100640 (1)  14148   1000       0  5-Aug-2013 16:09 3910f26b:456d16c2
 957411762  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910f1b2:455a5b06
 957411910  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910f246:455a5b18
 957411938  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910f262:455a5b2a
 986228498  100644 (1)  14457  10693       0 30-Sep-2013 09:41 3ac8a712:4624ae85
 957412006  100640 (1)  14148   1000    1432  6-Aug-2013 15:00 3910f2a6:456ed591
 957411758  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1ae:4430f8c1
 986228399  100644 (1)  14457  10693       0 30-Sep-2013 09:36 3ac8a6af:4624ac44
 957412289  100640 (1)  14148   1000    3560 18-Sep-2013 14:20 3910f3c1:45f5794f
 957412014  100640 (1)  14148   1000    2920  6-Aug-2013 15:00 3910f2ae:456ed5b9
 966319257  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dc99:455a5a29
 966319278  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dcae:455a5a53
 957412296  100640 (1)  14148   1000    5936 18-Sep-2013 14:20 3910f3c8:45f5795d
 957411914  100640 (1)  14148   1000       0  5-Aug-2013 16:09 3910f24a:456d16b3
 957411899  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910f23b:455a5b14
 957411934  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910f25e:455a5b22
 957411935  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910f25f:455a5b24
      0       0 (1)      0      0       0                   386b27d7:462c8b08
 957412011  100640 (1)  14148   1000    1456  6-Aug-2013 15:00 3910f2ab:456ed5b1
 957412283  100640 (1)  14148   1000    1592 18-Sep-2013 14:20 3910f3bb:45f57943
 966319266  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dca2:455a5a3b
 957412274  100640 (1)  14148   1000   34728 18-Sep-2013 14:20 3910f3b2:45f5792f
 957411944  100640 (1)  14148   1000       0  5-Aug-2013 16:09 3910f268:456d16bc
 957411748  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1a4:4430f8b9
 957411912  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910f248:455a5b1a
 957411936  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910f260:455a5b26
 996731325  100644 (1)  14527  10771   11899 21-Apr-2013 14:30 3b68e9bd:4410aa43
 957411777  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1c1:4430f8a3
 996713954  100644 (1)  14527  10771       0 22-Apr-2013 21:44 3b68a5e2:441201ad
 957412290  100640 (1)  14148   1000    2784 18-Sep-2013 14:20 3910f3c2:45f57951
 957411783  100640 (1)  14148   1000       0 24-Sep-2013 13:32 3910f1c7:46136e23
 957411746  100640 (1)  14148   1000       0 24-Sep-2013 13:32 3910f1a2:46136e31
 957411952  100640 (1)  14148   1000    3080  6-Aug-2013 15:00 3910f270:456ed587
 957411793  100640 (1)  14148   1000       0 24-Sep-2013 13:32 3910f1d1:46136e25
 946549666  100600 (1)     27     27       0 19-Feb-2013 21:14 386b33a2:4383ad98
 957411909  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910f245:455a5b16
 957411937  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910f261:455a5b28
 957411792  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1d0:4430f8b1
 957411956  100640 (1)  14148   1000       0  6-Aug-2013 15:00 3910f274:456ed59b
 957411957  100640 (1)  14148   1000       0  6-Aug-2013 15:00 3910f275:456ed5a1
 957412277  100640 (1)  14148   1000   15392 18-Sep-2013 14:20 3910f3b5:45f57937
 957412288  100640 (1)  14148   1000    1432 18-Sep-2013 14:20 3910f3c0:45f5794d
 957411755  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1ab:4430f885
 957411774  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1be:4430f88d
 957411786  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1ca:4430f89f
 957411760  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1b0:4430f897
 946562084  100600 (1)     27     27       0 19-Feb-2013 21:14 386b6424:4383ad9a
 957412281  100640 (1)  14148   1000    1528 18-Sep-2013 14:20 3910f3b9:45f5793f
 974177590  100600 (1)  14126  10756       0 29-Jul-2013 17:04 3a10c536:4552f7ba
 966319254  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dc96:455a5a15
 966319267  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dca3:455a5a41
 957412017  100640 (1)  14148   1000    9160  6-Aug-2013 15:00 3910f2b1:456ed5c8
 957411949  100640 (1)  14148   1000       0  5-Aug-2013 16:09 3910f26d:456d16c6
 957410609  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910ed31:455a5b00
 957411771  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1bb:4430f895
 957410630  100640 (1)  14148   1000       0 24-Sep-2013 13:32 3910ed46:46136e2b
 957411785  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1c9:4430f893
 957411951  100640 (1)  14148   1000   34728  6-Aug-2013 15:00 3910f26f:456ed583
 957412282  100640 (1)  14148   1000    1552 18-Sep-2013 14:20 3910f3ba:45f57941
 947502954  100644 (1)  14147   1000    2583  1-Aug-2013 13:56 3879bf6a:455a9f0c
 957412270  100640 (1)  14148   1000   11512 18-Sep-2013 14:20 3910f3ae:45f57927
 966319264  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dca0:455a5a3d
 957411945  100640 (1)  14148   1000       0  5-Aug-2013 16:09 3910f269:456d16be
 957411750  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1a6:4430f8bd
 957410605  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910ed2d:455a5af8
 957412010  100640 (1)  14148   1000       0  6-Aug-2013 15:00 3910f2aa:456ed59e
 966319274  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dcaa:455a5a4b
 957412291  100640 (1)  14148   1000    1456 18-Sep-2013 14:20 3910f3c3:45f57953
 957412294  100640 (1)  14148   1000    2920 18-Sep-2013 14:20 3910f3c6:45f57959
 957412298  100640 (1)  14148   1000   34128 18-Sep-2013 14:20 3910f3ca:45f57961
 957410626  100640 (1)  14148   1000       0 24-Sep-2013 13:32 3910ed42:46136e27
 986228494  100644 (1)  14457  10693       0 30-Sep-2013 09:41 3ac8a70e:4624ae84
 957411787  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1cb:4430f8a7
 957411752  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1a8:4430f8b5
 957410619  100640 (1)  14148   1000       0  9-May-2013 09:58 3910ed3b:4430f899
 957412271  100640 (1)  14148   1000   27408 18-Sep-2013 14:20 3910f3af:45f57929
 966319263  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dc9f:455a5a35
 946549314  100600 (1)     27     27       0 19-Feb-2013 21:14 386b3242:4383ad97
 966319268  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dca4:455a5a43
 966319270  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dca6:455a5a47
 966319276  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dcac:455a5a4f
 966319277  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dcad:455a5a51
 957412280  100640 (1)  14148   1000    1512 18-Sep-2013 14:20 3910f3b8:45f5793d
 1007223734  100755 (1)  14581  10888   35201  5-Sep-2013 19:58 3c0903b6:45cbc972
 957411763  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1b3:4430f88f
 957411883  100640 (1)  14148   1000   11512  6-Aug-2013 15:00 3910f22b:456ed56b
 957410599  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910ed27:455a5ae4
 957411931  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910f25b:455a5b20
 957412001  100640 (1)  14148   1000       0  6-Aug-2013 15:00 3910f2a1:456ed572
 966319258  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dc9a:455a5a2b
 966319280  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dcb0:455a5a57
 1004121320  100644 (1)  14581  10888    7999  6-Sep-2013 13:16 3bd9ace8:45ce741f
 957410625  100640 (1)  14148   1000       0  9-May-2013 09:58 3910ed41:4430f883
 957411757  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1ad:4430f8bb
 957411788  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1cc:4430f8a5
 957412286  100640 (1)  14148   1000    1544 18-Sep-2013 14:20 3910f3be:45f57949
 957412012  100640 (1)  14148   1000    7000  6-Aug-2013 15:00 3910f2ac:456ed5b5
 957412016  100640 (1)  14148   1000    5936  6-Aug-2013 15:00 3910f2b0:456ed5c1
 957412287  100640 (1)  14148   1000    1544 18-Sep-2013 14:20 3910f3bf:45f5794b
 957411941  100640 (1)  14148   1000       0  5-Aug-2013 16:09 3910f265:456d16b5
 957410604  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910ed2c:455a5af6
 957411928  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910f258:455a5b1c
 957411930  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910f25a:455a5b1e
 957412008  100640 (1)  14148   1000       0  6-Aug-2013 15:00 3910f2a8:456ed593
 957412279  100640 (1)  14148   1000    1512 18-Sep-2013 14:20 3910f3b7:45f5793b
 957410612  100640 (1)  14148   1000       0 24-Sep-2013 13:32 3910ed34:46136e21
 957411745  100640 (1)  14148   1000       0 24-Sep-2013 13:32 3910f1a1:46136e2f
 966319256  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dc98:455a5a37
 966319262  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dc9e:455a5a39
 957411942  100640 (1)  14148   1000       0  5-Aug-2013 16:09 3910f266:456d16b8
 957411943  100640 (1)  14148   1000       0  5-Aug-2013 16:09 3910f267:456d16ba
 957410607  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910ed2f:455a5afc
 966319279  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dcaf:455a5a55
 986228475  100644 (1)  14457  10693       0 30-Sep-2013 09:40 3ac8a6fb:4624ae40
 1007224421  100755 (1)  14581  10888   10597  5-Sep-2013 20:09 3c090665:45cbccfc
 957411784  100640 (1)  14148   1000       0  9-May-2013 09:58 3910f1c8:4430f89d
 957412022  100640 (1)  14148   1000    1992  6-Aug-2013 15:00 3910f2b6:456ed567
 986228464  100644 (1)  14457  10693       0 30-Sep-2013 09:40 3ac8a6f0:4624ae3f
 996730516  100755 (1)  14527  10771   1082706  6-Mar-2013 01:57 3b68e694:4410a146
 966319261  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dc9d:455a5a31
 966319265  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dca1:455a5a3f
 966319273  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dca9:455a5a49
 957410598  100640 (1)  14148   1000       0  1-Aug-2013 09:54 3910ed26:455a5ae2
 957411953  100640 (1)  14148   1000       0  6-Aug-2013 15:00 3910f271:456ed595
 957411958  100640 (1)  14148   1000       0  6-Aug-2013 15:00 3910f276:456ed5a3
 946562078  100600 (1)     27     27       0 19-Feb-2013 21:14 386b641e:4383ad99
 957412278  100640 (1)  14148   1000    1432 18-Sep-2013 14:20 3910f3b6:45f57939
 957411950  100640 (1)  14148   1000       0  5-Aug-2013 16:09 3910f26e:456d16c8
 966319252  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dc94:455a5a0b
 966319260  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dc9c:455a5a2f
 946562086  100600 (1)     27     27       0 19-Feb-2013 21:14 386b6426:4383ad9b
      0       0 (1)      0      0       0                   386b27d7:462c58b9
 966319253  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dc95:455a5a0d
 966319259  100644 (1)  14147   1000       0  1-Aug-2013 09:54 3998dc9b:455a5a2d
 957412009  100640 (1)  14148   1000    1528  6-Aug-2013 15:00 3910f2a9:456ed597
 957412013  100640 (1)  14148   1000    5240  6-Aug-2013 15:00 3910f2ad:456ed5b7
 957410618  100640 (1)  14148   1000       0 24-Sep-2013 13:32 3910ed3a:46136e1f
 957412297  100640 (1)  14148   1000    9160 18-Sep-2013 14:20 3910f3c9:45f5795f
 946996450  100640 (1)  14276  10361       0  2-Oct-2013 09:54 387204e2:462ca88a

debugfs:   

Can we retrieve file name from this FID?
Some files has size, but many are "0". Why many files are showing "0" size?
Some files are not showing even user/group id. Why this occurred?
Does this output mean these files are all open unlinked files? i.e. there should be a process which is opening the file.
The customer said it is unlikely a file is opened for months..Also, they said usage for kawashin user reported by lfs quota increased gradually while there was no creation or modification.
As far as we understand this issue, PENDING directory does not explain this behavior.

Comment by Johann Lombardi (Inactive) [ 02/Oct/13 ]

Currently, we do not see files owned by "kawashin" user (ID: 14520).

Shall i understand that there was only one PENDING* directory just called PENDING?

Can we retrieve file name from this FID?

I am afraid that the extended attribute storing the name & parent FID has been updated already, so it is not possible.
Anyway, you run 1.8, so the attribute does not even exist.

Some files has size, but many are "0". Why many files are showing "0" size?

The size reported on the MDS is just a hint updated at close time. There is no data on the MDS. The actual content of those files are stored in OST objects. The LOV EA attribute should still be valid.

Some files are not showing even user/group id. Why this occurred?

Are the same files still present if you rerun "ls" through debugfs a second time?

Does this output mean these files are all open unlinked files? i.e. there should be a process which is opening the file.

Right, there should be.

The customer said it is unlikely a file is opened for months.

Then you should move those files back to the namespace and unlink them. I think this has been advised multiple times (by Niu and myself) and it hasn't been performed yet.

Also, they said usage for kawashin user reported by lfs quota increased gradually while there was no creation or modification.
As far as we understand this issue, PENDING directory does not explain this behavior.

Could you please tell me in details how many inodes are reported by lfs quota and how many the user thinks it has?
Previously, you said:

inode usage is a bit incorrect. From the list of user files provided when we created this ticket, number of files are 14145 and quota was showing 14157.
$ wc file_size_20130821.txt
14145 127380 1875155 file_size_20130821.txt

However, the latest output of lfs quota you provided correctly showed 14145 files:

[root@wk2 ~]# lfs quota -v -u kawashin /nshare2
Disk quotas for user kawashin (uid 14520):
Filesystem kbytes quota limit grace files quota limit grace
/nshare2 65116548 0 0 - 14145 0 0 -
...

Could you please elaborate?

Comment by Shuichi Ihara (Inactive) [ 06/Feb/14 ]

This was not bug and finally we solved this situation when files in PENDING directory cleaned up. Thanks investigation, please close ticket.

Comment by Peter Jones [ 06/Feb/14 ]

ok. Thanks Ihara

Generated at Sat Feb 10 01:37:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.