[LU-2619] Bogus value of dqb_curinodes returned by osc_quotactl - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Won't Fix
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.4.0
Labels:
- llnl

Severity:
3
Rank (Obsolete):
6124

Description

When running lfs quota -u <USER> <FS> on Sequoia, a couple users do not have any files in their directory but quota reports a bogus large value:

# lfs quota -u pjmccart /p/ls1
Disk quotas for user pjmccart (uid 8624):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
         /p/ls1     913       0       0       - 18446744073709547865       0       0       -

# du -sh /p/ls1/pjmccart/
913K    /p/ls1/pjmccart/

# ls -alR /p/ls1/pjmccart/
/p/ls1/pjmccart/:
total 1214
913 drwx------    2 pjmccart pjmccart 934400 Nov 15 10:28 ./
302 drwxr-xr-x 2193 root     root     308736 Jan 11 08:05 ../

Using systemtap to print the obd_quotactl structure when the osc_quotactl function returns, I see odd values coming from two of the OSCs:

osc_quotactl: "ls1-OST0037-osc-c0000003c865a400": {.qc_cmd=8388867, .qc_type=0, .qc_id=8624, .qc_stat=0, .qc_dqinfo={.dqi_bgrace=0, .dqi_igrace=0, .dqi_flags=0, .dqi_valid=0}, .qc_dqblk={.dqb_bhardlimit=0, .dqb_bsoftlimit=0, .dqb_curspace=0, .dqb_ihardlimit=0, .dqb_isoftlimit=0, .dqb_curinodes=0, .dqb_btime=0, .dqb_itime=0, .dqb_valid=15, .dqb_padding=0}}
osc_quotactl: "ls1-OST0038-osc-c0000003c865a400": {.qc_cmd=8388867, .qc_type=0, .qc_id=8624, .qc_stat=0, .qc_dqinfo={.dqi_bgrace=0, .dqi_igrace=0, .dqi_flags=0, .dqi_valid=0}, .qc_dqblk={.dqb_bhardlimit=0, .dqb_bsoftlimit=0, .dqb_curspace=0, .dqb_ihardlimit=0, .dqb_isoftlimit=0, .dqb_curinodes=18446744073709551615, .dqb_btime=0, .dqb_itime=0, .dqb_valid=15, .dqb_padding=0}}
osc_quotactl: "ls1-OST0039-osc-c0000003c865a400": {.qc_cmd=8388867, .qc_type=0, .qc_id=8624, .qc_stat=0, .qc_dqinfo={.dqi_bgrace=0, .dqi_igrace=0, .dqi_flags=0, .dqi_valid=0}, .qc_dqblk={.dqb_bhardlimit=0, .dqb_bsoftlimit=0, .dqb_curspace=0, .dqb_ihardlimit=0, .dqb_isoftlimit=0, .dqb_curinodes=0, .dqb_btime=0, .dqb_itime=0, .dqb_valid=15, .dqb_padding=0}}

osc_quotactl: "ls1-OST0073-osc-c0000003c865a400": {.qc_cmd=8388867, .qc_type=0, .qc_id=8624, .qc_stat=0, .qc_dqinfo={.dqi_bgrace=0, .dqi_igrace=0, .dqi_flags=0, .dqi_valid=0}, .qc_dqblk={.dqb_bhardlimit=0, .dqb_bsoftlimit=0, .dqb_curspace=0, .dqb_ihardlimit=0, .dqb_isoftlimit=0, .dqb_curinodes=3, .dqb_btime=0, .dqb_itime=0, .dqb_valid=15, .dqb_padding=0}}
osc_quotactl: "ls1-OST0074-osc-c0000003c865a400": {.qc_cmd=8388867, .qc_type=0, .qc_id=8624, .qc_stat=0, .qc_dqinfo={.dqi_bgrace=0, .dqi_igrace=0, .dqi_flags=0, .dqi_valid=0}, .qc_dqblk={.dqb_bhardlimit=0, .dqb_bsoftlimit=0, .dqb_curspace=0, .dqb_ihardlimit=0, .dqb_isoftlimit=0, .dqb_curinodes=18446744073709551615, .dqb_btime=0, .dqb_itime=0, .dqb_valid=15, .dqb_padding=0}}
osc_quotactl: "ls1-OST0075-osc-c0000003c865a400": {.qc_cmd=8388867, .qc_type=0, .qc_id=8624, .qc_stat=0, .qc_dqinfo={.dqi_bgrace=0, .dqi_igrace=0, .dqi_flags=0, .dqi_valid=0}, .qc_dqblk={.dqb_bhardlimit=0, .dqb_bsoftlimit=0, .dqb_curspace=0, .dqb_ihardlimit=0, .dqb_isoftlimit=0, .dqb_curinodes=3, .dqb_btime=0, .dqb_itime=0, .dqb_valid=15, .dqb_padding=0}}

Specifically, the values of dqb_curinodes:

ls1-OST0074-osc-c0000003c865a400:dqb_curinodes=18446744073709551615
ls1-OST0038-osc-c0000003c865a400:dqb_curinodes=18446744073709551615

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

porter-mds1.txt.gz
129 kB
04/Nov/13 10:33 PM

Issue Links

is related to

LU-2435 inode accounting in osd-zfs is racy

Resolved

Activity

[LU-2619] Bogus value of dqb_curinodes returned by osc_quotactl

Niu Yawei (Inactive) added a comment - 13/Jan/15 7:03 AM

From the log we can see the bogus value is from MDS, and it's read from the zap object which we created for inode accounting. Given that this problem happened only for inode accounting, I highly suspect it's related to ~~LU-2435~~. I think a temporary workaround is to set the quota_iused_estimate to 1.

Do you need lines from OSTs as well, or just from the MDS?

No, I think MDS log is enough. Thank you.

Niu Yawei (Inactive) added a comment - 13/Jan/15 7:03 AM From the log we can see the bogus value is from MDS, and it's read from the zap object which we created for inode accounting. Given that this problem happened only for inode accounting, I highly suspect it's related to LU-2435 . I think a temporary workaround is to set the quota_iused_estimate to 1. Do you need lines from OSTs as well, or just from the MDS? No, I think MDS log is enough. Thank you.

D. Marc Stearman (Inactive) added a comment - 12/Jan/15 11:11 PM

I enabled +quota debugging on the MDS. Then I ran this command:

[root@surface86:~]# lfs quota -u weems2 /p/lscratche
Disk quotas for user weems2 (uid 59519):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
   /p/lscratche      88       0       0       - 18446744073709551462       0       0       -
[root@surface86:~]#

You can see that the files column is very large. I then dumped the debug logs on the MDS right after that: These are the lines from the quota debugging:

00040000:04000000:5.0F:1421104120.034812:0:13022:0:(qmt_handler.c:65:qmt_get()) $$$ fetch settings qmt:lse-QMT0000 pool:0-md id:59519 enforced:0 hard:0 soft:0 granted:0 time:0 qunit:0 edquot:0 may_rel:0 revoke:0
00040000:04000000:5.0:1421104120.034818:0:13022:0:(qmt_handler.c:65:qmt_get()) $$$ fetch settings qmt:lse-QMT0000 pool:0-dt id:59519 enforced:0 hard:0 soft:0 granted:0 time:0 qunit:0 edquot:0 may_rel:0 revoke:0
00000001:04000000:4.0F:1421104120.068222:0:13022:0:(osd_quota.c:122:osd_acct_index_lookup()) lse-MDT0000: id:e87f, ispace:18446744073709551462, bspace:90112

Do you need lines from OSTs as well, or just from the MDS?

D. Marc Stearman (Inactive) added a comment - 12/Jan/15 11:11 PM I enabled +quota debugging on the MDS. Then I ran this command: [root@surface86:~]# lfs quota -u weems2 /p/lscratche Disk quotas for user weems2 (uid 59519): Filesystem kbytes quota limit grace files quota limit grace /p/lscratche 88 0 0 - 18446744073709551462 0 0 - [root@surface86:~]# You can see that the files column is very large. I then dumped the debug logs on the MDS right after that: These are the lines from the quota debugging: 00040000:04000000:5.0F:1421104120.034812:0:13022:0:(qmt_handler.c:65:qmt_get()) $$$ fetch settings qmt:lse-QMT0000 pool:0-md id:59519 enforced:0 hard:0 soft:0 granted:0 time:0 qunit:0 edquot:0 may_rel:0 revoke:0 00040000:04000000:5.0:1421104120.034818:0:13022:0:(qmt_handler.c:65:qmt_get()) $$$ fetch settings qmt:lse-QMT0000 pool:0-dt id:59519 enforced:0 hard:0 soft:0 granted:0 time:0 qunit:0 edquot:0 may_rel:0 revoke:0 00000001:04000000:4.0F:1421104120.068222:0:13022:0:(osd_quota.c:122:osd_acct_index_lookup()) lse-MDT0000: id:e87f, ispace:18446744073709551462, bspace:90112 Do you need lines from OSTs as well, or just from the MDS?

Niu Yawei (Inactive) added a comment - 10/Jul/14 2:50 AM

Thank you, I didn't see any difference in client code.

Looks the debug patch (http://review.whamcloud.com/#/c/8191/) has been applied in the code, is it possible to capture server logs with D_QUOTA enabled? So we can see if the bogus value is returned from osd_acct_index_lookup().

Niu Yawei (Inactive) added a comment - 10/Jul/14 2:50 AM Thank you, I didn't see any difference in client code. Looks the debug patch ( http://review.whamcloud.com/#/c/8191/ ) has been applied in the code, is it possible to capture server logs with D_QUOTA enabled? So we can see if the bogus value is returned from osd_acct_index_lookup().

Christopher Morrone (Inactive) added a comment - 09/Jul/14 6:06 PM

In other words, tag 2.4.2-13chaos.

Christopher Morrone (Inactive) added a comment - 09/Jul/14 6:06 PM In other words, tag 2.4.2-13chaos.

D. Marc Stearman (Inactive) added a comment - 09/Jul/14 3:21 PM

Yes it is still happening on all of our file systems, so the current tag will work for you.

D. Marc Stearman (Inactive) added a comment - 09/Jul/14 3:21 PM Yes it is still happening on all of our file systems, so the current tag will work for you.

Prakash Surya (Inactive) added a comment - 09/Jul/14 2:57 PM

As always, our source and releases are on github: https://github.com/chaos/lustre

As far as which releases were installed on the servers and clients in question, I'll have to ask the admins. Marc Stearman, can you double check this issue is still occurring on Sequoia and report back the version currently installed there?

Prakash Surya (Inactive) added a comment - 09/Jul/14 2:57 PM As always, our source and releases are on github: https://github.com/chaos/lustre As far as which releases were installed on the servers and clients in question, I'll have to ask the admins. Marc Stearman, can you double check this issue is still occurring on Sequoia and report back the version currently installed there?

Niu Yawei (Inactive) added a comment - 09/Jul/14 1:02 PM

The bogus 'dqb_curinodes' comes from OST, I'm wondering how it can contribute to the 'files' of 'lfs quota' output, because we only collect inode usage on MDTs:

                        /* collect space usage from OSTs */
                        oqctl_tmp->qc_dqblk.dqb_curspace = 0;
                        rc = obd_quotactl(sbi->ll_dt_exp, oqctl_tmp);
                        if (!rc || rc == -EREMOTEIO) {
                                oqctl->qc_dqblk.dqb_curspace =
                                        oqctl_tmp->qc_dqblk.dqb_curspace;
                                oqctl->qc_dqblk.dqb_valid |= QIF_SPACE;
                        }

                        /* collect space & inode usage from MDTs */
                        oqctl_tmp->qc_dqblk.dqb_curspace = 0;
                        oqctl_tmp->qc_dqblk.dqb_curinodes = 0;
                        rc = obd_quotactl(sbi->ll_md_exp, oqctl_tmp);
                        if (!rc || rc == -EREMOTEIO) {
                                oqctl->qc_dqblk.dqb_curspace +=
                                        oqctl_tmp->qc_dqblk.dqb_curspace;
                                oqctl->qc_dqblk.dqb_curinodes =
                                        oqctl_tmp->qc_dqblk.dqb_curinodes;
                                oqctl->qc_dqblk.dqb_valid |= QIF_INODES;
                        } else {
                                oqctl->qc_dqblk.dqb_valid &= ~QIF_SPACE;
                        }

I did some local testing that making OST to return fake 'curinodes' to client, however, client ignored the fake value as expected.

While investigating why server returns bogus value, I think I'd verify the client code you running wasn't changed by some unexpected patch. Could you show me where to check the client code? (llnl tree? which tag?) Thank you.

Niu Yawei (Inactive) added a comment - 09/Jul/14 1:02 PM The bogus 'dqb_curinodes' comes from OST, I'm wondering how it can contribute to the 'files' of 'lfs quota' output, because we only collect inode usage on MDTs: /* collect space usage from OSTs */ oqctl_tmp->qc_dqblk.dqb_curspace = 0; rc = obd_quotactl(sbi->ll_dt_exp, oqctl_tmp); if (!rc || rc == -EREMOTEIO) { oqctl->qc_dqblk.dqb_curspace = oqctl_tmp->qc_dqblk.dqb_curspace; oqctl->qc_dqblk.dqb_valid |= QIF_SPACE; } /* collect space & inode usage from MDTs */ oqctl_tmp->qc_dqblk.dqb_curspace = 0; oqctl_tmp->qc_dqblk.dqb_curinodes = 0; rc = obd_quotactl(sbi->ll_md_exp, oqctl_tmp); if (!rc || rc == -EREMOTEIO) { oqctl->qc_dqblk.dqb_curspace += oqctl_tmp->qc_dqblk.dqb_curspace; oqctl->qc_dqblk.dqb_curinodes = oqctl_tmp->qc_dqblk.dqb_curinodes; oqctl->qc_dqblk.dqb_valid |= QIF_INODES; } else { oqctl->qc_dqblk.dqb_valid &= ~QIF_SPACE; } I did some local testing that making OST to return fake 'curinodes' to client, however, client ignored the fake value as expected. While investigating why server returns bogus value, I think I'd verify the client code you running wasn't changed by some unexpected patch. Could you show me where to check the client code? (llnl tree? which tag?) Thank you.

Niu Yawei (Inactive) added a comment - 14/Nov/13 1:23 AM

I'm assuming the bad "dqb_curinodes" values uncovered by the client systemtap script is coming from the server, is that not the case?

That's quite possible. I'm just not sure the purpose of using a 'fail_loc' on server to return bad value to client.

Niu Yawei (Inactive) added a comment - 14/Nov/13 1:23 AM I'm assuming the bad "dqb_curinodes" values uncovered by the client systemtap script is coming from the server, is that not the case? That's quite possible. I'm just not sure the purpose of using a 'fail_loc' on server to return bad value to client.

Prakash Surya (Inactive) added a comment - 13/Nov/13 7:42 PM

Well, I'm still unsure where the bad value is coming from, but my guess is it's coming from the server. I could be wrong, though.

I'm assuming the bad "dqb_curinodes" values uncovered by the client systemtap script is coming from the server, is that not the case?

Prakash Surya (Inactive) added a comment - 13/Nov/13 7:42 PM Well, I'm still unsure where the bad value is coming from, but my guess is it's coming from the server. I could be wrong, though. I'm assuming the bad "dqb_curinodes" values uncovered by the client systemtap script is coming from the server, is that not the case?

Niu Yawei (Inactive) added a comment - 13/Nov/13 1:44 AM

what do you mean using a 'fail_loc' on server to return a bogus value to client? I don't think server is expected to return a bad value.

Niu Yawei (Inactive) added a comment - 13/Nov/13 1:44 AM what do you mean using a 'fail_loc' on server to return a bogus value to client? I don't think server is expected to return a bad value.

Prakash Surya (Inactive) added a comment - 12/Nov/13 5:45 PM

Yes, we see the same behavior on 2.1 and 2.4 clients. The server is 2.4 only, though. I don't know if the same would happen on a 2.1 server. We have a reproducer, but I think it is dependent on the server returning a "bad" value. Perhaps we can try to reproduce this in a VM setup, using a "fail_loc" on the server to return a bogus value to the client? I haven't tried that, but it might work.

Prakash Surya (Inactive) added a comment - 12/Nov/13 5:45 PM Yes, we see the same behavior on 2.1 and 2.4 clients. The server is 2.4 only, though. I don't know if the same would happen on a 2.1 server. We have a reproducer, but I think it is dependent on the server returning a "bad" value. Perhaps we can try to reproduce this in a VM setup, using a "fail_loc" on the server to return a bogus value to the client? I haven't tried that, but it might work.

People

Assignee:: Niu Yawei (Inactive)

Reporter:: Prakash Surya (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 15/Jan/13 11:03 AM

Updated:: 11/May/15 9:10 PM

Resolved:: 20/Jan/15 11:38 PM