Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2619

Bogus value of dqb_curinodes returned by osc_quotactl

Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • None
    • Lustre 2.4.0
    • 3
    • 6124

    Description

      When running lfs quota -u <USER> <FS> on Sequoia, a couple users do not have any files in their directory but quota reports a bogus large value:

      # lfs quota -u pjmccart /p/ls1
      Disk quotas for user pjmccart (uid 8624):
           Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
               /p/ls1     913       0       0       - 18446744073709547865       0       0       -
      
      # du -sh /p/ls1/pjmccart/
      913K    /p/ls1/pjmccart/
      
      # ls -alR /p/ls1/pjmccart/
      /p/ls1/pjmccart/:
      total 1214
      913 drwx------    2 pjmccart pjmccart 934400 Nov 15 10:28 ./
      302 drwxr-xr-x 2193 root     root     308736 Jan 11 08:05 ../ 
      

      Using systemtap to print the obd_quotactl structure when the osc_quotactl function returns, I see odd values coming from two of the OSCs:

      osc_quotactl: "ls1-OST0037-osc-c0000003c865a400": {.qc_cmd=8388867, .qc_type=0, .qc_id=8624, .qc_stat=0, .qc_dqinfo={.dqi_bgrace=0, .dqi_igrace=0, .dqi_flags=0, .dqi_valid=0}, .qc_dqblk={.dqb_bhardlimit=0, .dqb_bsoftlimit=0, .dqb_curspace=0, .dqb_ihardlimit=0, .dqb_isoftlimit=0, .dqb_curinodes=0, .dqb_btime=0, .dqb_itime=0, .dqb_valid=15, .dqb_padding=0}}
      osc_quotactl: "ls1-OST0038-osc-c0000003c865a400": {.qc_cmd=8388867, .qc_type=0, .qc_id=8624, .qc_stat=0, .qc_dqinfo={.dqi_bgrace=0, .dqi_igrace=0, .dqi_flags=0, .dqi_valid=0}, .qc_dqblk={.dqb_bhardlimit=0, .dqb_bsoftlimit=0, .dqb_curspace=0, .dqb_ihardlimit=0, .dqb_isoftlimit=0, .dqb_curinodes=18446744073709551615, .dqb_btime=0, .dqb_itime=0, .dqb_valid=15, .dqb_padding=0}}
      osc_quotactl: "ls1-OST0039-osc-c0000003c865a400": {.qc_cmd=8388867, .qc_type=0, .qc_id=8624, .qc_stat=0, .qc_dqinfo={.dqi_bgrace=0, .dqi_igrace=0, .dqi_flags=0, .dqi_valid=0}, .qc_dqblk={.dqb_bhardlimit=0, .dqb_bsoftlimit=0, .dqb_curspace=0, .dqb_ihardlimit=0, .dqb_isoftlimit=0, .dqb_curinodes=0, .dqb_btime=0, .dqb_itime=0, .dqb_valid=15, .dqb_padding=0}}
      
      osc_quotactl: "ls1-OST0073-osc-c0000003c865a400": {.qc_cmd=8388867, .qc_type=0, .qc_id=8624, .qc_stat=0, .qc_dqinfo={.dqi_bgrace=0, .dqi_igrace=0, .dqi_flags=0, .dqi_valid=0}, .qc_dqblk={.dqb_bhardlimit=0, .dqb_bsoftlimit=0, .dqb_curspace=0, .dqb_ihardlimit=0, .dqb_isoftlimit=0, .dqb_curinodes=3, .dqb_btime=0, .dqb_itime=0, .dqb_valid=15, .dqb_padding=0}}
      osc_quotactl: "ls1-OST0074-osc-c0000003c865a400": {.qc_cmd=8388867, .qc_type=0, .qc_id=8624, .qc_stat=0, .qc_dqinfo={.dqi_bgrace=0, .dqi_igrace=0, .dqi_flags=0, .dqi_valid=0}, .qc_dqblk={.dqb_bhardlimit=0, .dqb_bsoftlimit=0, .dqb_curspace=0, .dqb_ihardlimit=0, .dqb_isoftlimit=0, .dqb_curinodes=18446744073709551615, .dqb_btime=0, .dqb_itime=0, .dqb_valid=15, .dqb_padding=0}}
      osc_quotactl: "ls1-OST0075-osc-c0000003c865a400": {.qc_cmd=8388867, .qc_type=0, .qc_id=8624, .qc_stat=0, .qc_dqinfo={.dqi_bgrace=0, .dqi_igrace=0, .dqi_flags=0, .dqi_valid=0}, .qc_dqblk={.dqb_bhardlimit=0, .dqb_bsoftlimit=0, .dqb_curspace=0, .dqb_ihardlimit=0, .dqb_isoftlimit=0, .dqb_curinodes=3, .dqb_btime=0, .dqb_itime=0, .dqb_valid=15, .dqb_padding=0}}
      

      Specifically, the values of dqb_curinodes:

      ls1-OST0074-osc-c0000003c865a400:dqb_curinodes=18446744073709551615
      ls1-OST0038-osc-c0000003c865a400:dqb_curinodes=18446744073709551615
      

      Attachments

        Issue Links

          Activity

            [LU-2619] Bogus value of dqb_curinodes returned by osc_quotactl

            From the log we can see the bogus value is from MDS, and it's read from the zap object which we created for inode accounting. Given that this problem happened only for inode accounting, I highly suspect it's related to LU-2435. I think a temporary workaround is to set the quota_iused_estimate to 1.

            Do you need lines from OSTs as well, or just from the MDS?

            No, I think MDS log is enough. Thank you.

            niu Niu Yawei (Inactive) added a comment - From the log we can see the bogus value is from MDS, and it's read from the zap object which we created for inode accounting. Given that this problem happened only for inode accounting, I highly suspect it's related to LU-2435 . I think a temporary workaround is to set the quota_iused_estimate to 1. Do you need lines from OSTs as well, or just from the MDS? No, I think MDS log is enough. Thank you.

            I enabled +quota debugging on the MDS. Then I ran this command:

            [root@surface86:~]# lfs quota -u weems2 /p/lscratche
            Disk quotas for user weems2 (uid 59519):
                 Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
               /p/lscratche      88       0       0       - 18446744073709551462       0       0       -
            [root@surface86:~]# 
            

            You can see that the files column is very large. I then dumped the debug logs on the MDS right after that: These are the lines from the quota debugging:

            00040000:04000000:5.0F:1421104120.034812:0:13022:0:(qmt_handler.c:65:qmt_get()) $$$ fetch settings qmt:lse-QMT0000 pool:0-md id:59519 enforced:0 hard:0 soft:0 granted:0 time:0 qunit:0 edquot:0 may_rel:0 revoke:0
            00040000:04000000:5.0:1421104120.034818:0:13022:0:(qmt_handler.c:65:qmt_get()) $$$ fetch settings qmt:lse-QMT0000 pool:0-dt id:59519 enforced:0 hard:0 soft:0 granted:0 time:0 qunit:0 edquot:0 may_rel:0 revoke:0
            00000001:04000000:4.0F:1421104120.068222:0:13022:0:(osd_quota.c:122:osd_acct_index_lookup()) lse-MDT0000: id:e87f, ispace:18446744073709551462, bspace:90112
            

            Do you need lines from OSTs as well, or just from the MDS?

            marc@llnl.gov D. Marc Stearman (Inactive) added a comment - I enabled +quota debugging on the MDS. Then I ran this command: [root@surface86:~]# lfs quota -u weems2 /p/lscratche Disk quotas for user weems2 (uid 59519): Filesystem kbytes quota limit grace files quota limit grace /p/lscratche 88 0 0 - 18446744073709551462 0 0 - [root@surface86:~]# You can see that the files column is very large. I then dumped the debug logs on the MDS right after that: These are the lines from the quota debugging: 00040000:04000000:5.0F:1421104120.034812:0:13022:0:(qmt_handler.c:65:qmt_get()) $$$ fetch settings qmt:lse-QMT0000 pool:0-md id:59519 enforced:0 hard:0 soft:0 granted:0 time:0 qunit:0 edquot:0 may_rel:0 revoke:0 00040000:04000000:5.0:1421104120.034818:0:13022:0:(qmt_handler.c:65:qmt_get()) $$$ fetch settings qmt:lse-QMT0000 pool:0-dt id:59519 enforced:0 hard:0 soft:0 granted:0 time:0 qunit:0 edquot:0 may_rel:0 revoke:0 00000001:04000000:4.0F:1421104120.068222:0:13022:0:(osd_quota.c:122:osd_acct_index_lookup()) lse-MDT0000: id:e87f, ispace:18446744073709551462, bspace:90112 Do you need lines from OSTs as well, or just from the MDS?

            Thank you, I didn't see any difference in client code.

            Looks the debug patch (http://review.whamcloud.com/#/c/8191/) has been applied in the code, is it possible to capture server logs with D_QUOTA enabled? So we can see if the bogus value is returned from osd_acct_index_lookup().

            niu Niu Yawei (Inactive) added a comment - Thank you, I didn't see any difference in client code. Looks the debug patch ( http://review.whamcloud.com/#/c/8191/ ) has been applied in the code, is it possible to capture server logs with D_QUOTA enabled? So we can see if the bogus value is returned from osd_acct_index_lookup().

            In other words, tag 2.4.2-13chaos.

            morrone Christopher Morrone (Inactive) added a comment - In other words, tag 2.4.2-13chaos.

            Yes it is still happening on all of our file systems, so the current tag will work for you.

            marc@llnl.gov D. Marc Stearman (Inactive) added a comment - Yes it is still happening on all of our file systems, so the current tag will work for you.

            As always, our source and releases are on github: https://github.com/chaos/lustre

            As far as which releases were installed on the servers and clients in question, I'll have to ask the admins. Marc Stearman, can you double check this issue is still occurring on Sequoia and report back the version currently installed there?

            prakash Prakash Surya (Inactive) added a comment - As always, our source and releases are on github: https://github.com/chaos/lustre As far as which releases were installed on the servers and clients in question, I'll have to ask the admins. Marc Stearman, can you double check this issue is still occurring on Sequoia and report back the version currently installed there?

            The bogus 'dqb_curinodes' comes from OST, I'm wondering how it can contribute to the 'files' of 'lfs quota' output, because we only collect inode usage on MDTs:

                                    /* collect space usage from OSTs */
                                    oqctl_tmp->qc_dqblk.dqb_curspace = 0;
                                    rc = obd_quotactl(sbi->ll_dt_exp, oqctl_tmp);
                                    if (!rc || rc == -EREMOTEIO) {
                                            oqctl->qc_dqblk.dqb_curspace =
                                                    oqctl_tmp->qc_dqblk.dqb_curspace;
                                            oqctl->qc_dqblk.dqb_valid |= QIF_SPACE;
                                    }
            
                                    /* collect space & inode usage from MDTs */
                                    oqctl_tmp->qc_dqblk.dqb_curspace = 0;
                                    oqctl_tmp->qc_dqblk.dqb_curinodes = 0;
                                    rc = obd_quotactl(sbi->ll_md_exp, oqctl_tmp);
                                    if (!rc || rc == -EREMOTEIO) {
                                            oqctl->qc_dqblk.dqb_curspace +=
                                                    oqctl_tmp->qc_dqblk.dqb_curspace;
                                            oqctl->qc_dqblk.dqb_curinodes =
                                                    oqctl_tmp->qc_dqblk.dqb_curinodes;
                                            oqctl->qc_dqblk.dqb_valid |= QIF_INODES;
                                    } else {
                                            oqctl->qc_dqblk.dqb_valid &= ~QIF_SPACE;
                                    }
            

            I did some local testing that making OST to return fake 'curinodes' to client, however, client ignored the fake value as expected.

            While investigating why server returns bogus value, I think I'd verify the client code you running wasn't changed by some unexpected patch. Could you show me where to check the client code? (llnl tree? which tag?) Thank you.

            niu Niu Yawei (Inactive) added a comment - The bogus 'dqb_curinodes' comes from OST, I'm wondering how it can contribute to the 'files' of 'lfs quota' output, because we only collect inode usage on MDTs: /* collect space usage from OSTs */ oqctl_tmp->qc_dqblk.dqb_curspace = 0; rc = obd_quotactl(sbi->ll_dt_exp, oqctl_tmp); if (!rc || rc == -EREMOTEIO) { oqctl->qc_dqblk.dqb_curspace = oqctl_tmp->qc_dqblk.dqb_curspace; oqctl->qc_dqblk.dqb_valid |= QIF_SPACE; } /* collect space & inode usage from MDTs */ oqctl_tmp->qc_dqblk.dqb_curspace = 0; oqctl_tmp->qc_dqblk.dqb_curinodes = 0; rc = obd_quotactl(sbi->ll_md_exp, oqctl_tmp); if (!rc || rc == -EREMOTEIO) { oqctl->qc_dqblk.dqb_curspace += oqctl_tmp->qc_dqblk.dqb_curspace; oqctl->qc_dqblk.dqb_curinodes = oqctl_tmp->qc_dqblk.dqb_curinodes; oqctl->qc_dqblk.dqb_valid |= QIF_INODES; } else { oqctl->qc_dqblk.dqb_valid &= ~QIF_SPACE; } I did some local testing that making OST to return fake 'curinodes' to client, however, client ignored the fake value as expected. While investigating why server returns bogus value, I think I'd verify the client code you running wasn't changed by some unexpected patch. Could you show me where to check the client code? (llnl tree? which tag?) Thank you.

            I'm assuming the bad "dqb_curinodes" values uncovered by the client systemtap script is coming from the server, is that not the case?

            That's quite possible. I'm just not sure the purpose of using a 'fail_loc' on server to return bad value to client.

            niu Niu Yawei (Inactive) added a comment - I'm assuming the bad "dqb_curinodes" values uncovered by the client systemtap script is coming from the server, is that not the case? That's quite possible. I'm just not sure the purpose of using a 'fail_loc' on server to return bad value to client.

            Well, I'm still unsure where the bad value is coming from, but my guess is it's coming from the server. I could be wrong, though.

            I'm assuming the bad "dqb_curinodes" values uncovered by the client systemtap script is coming from the server, is that not the case?

            prakash Prakash Surya (Inactive) added a comment - Well, I'm still unsure where the bad value is coming from, but my guess is it's coming from the server. I could be wrong, though. I'm assuming the bad "dqb_curinodes" values uncovered by the client systemtap script is coming from the server, is that not the case?

            what do you mean using a 'fail_loc' on server to return a bogus value to client? I don't think server is expected to return a bad value.

            niu Niu Yawei (Inactive) added a comment - what do you mean using a 'fail_loc' on server to return a bogus value to client? I don't think server is expected to return a bad value.

            Yes, we see the same behavior on 2.1 and 2.4 clients. The server is 2.4 only, though. I don't know if the same would happen on a 2.1 server. We have a reproducer, but I think it is dependent on the server returning a "bad" value. Perhaps we can try to reproduce this in a VM setup, using a "fail_loc" on the server to return a bogus value to the client? I haven't tried that, but it might work.

            prakash Prakash Surya (Inactive) added a comment - Yes, we see the same behavior on 2.1 and 2.4 clients. The server is 2.4 only, though. I don't know if the same would happen on a 2.1 server. We have a reproducer, but I think it is dependent on the server returning a "bad" value. Perhaps we can try to reproduce this in a VM setup, using a "fail_loc" on the server to return a bogus value to the client? I haven't tried that, but it might work.

            People

              niu Niu Yawei (Inactive)
              prakash Prakash Surya (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: