[LU-6247] osd-zfs: sanity-sec test_16, test_22: 27919 != 26845 + 1M after write - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.8.0
Affects Version/s: Lustre 2.7.0
Labels:
- patch

Severity:
3
Rank (Obsolete):
17497

Description

This issue was created by maloo for Isaac Huang <he.huang@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/f61d2cb6-b2a2-11e4-a8f5-5254006e85c2.

The sub-test test_22 failed with the following error:

27919 != 26845 + 1M after write

It seemed that the test was using 4096 as blocksize for the ZFS OST to estimate quota, which would fail.

On the client:

[root@eagle-44vm2 ~]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff88007a5bb800/import
import:
    name: lustre-OST0000-osc-ffff88007a5bb800
    target: lustre-OST0000_UUID
    state: FULL
    connect_flags: [ write_grant, server_lock, version, request_portal, truncate_lock, max_byte_per_rpc, early_lock_cancel, adaptive_timeouts, lru_resize, alt_checksum_algorithm, fid_is_enabled, version_recovery, full20, layout_lock, 64bithash, object_max_bytes, jobstats, einprogress, lvb_type, lfsck ]
    connect_data:
       flags: 0x404af0e3440478
       instance: 1
       target_version: 2.6.94.0
       initial_grant: 1048576
       max_brw_size: 4194304
       grant_block_size: 0
       grant_inode_size: 0
       grant_extent_overhead: 0
       cksum_types: 0x2
       max_easize: 32768
       max_object_bytes: 9223372036854775807
    import_flags: [ replayable, pingable, connect_tried ]
    connection:
       failover_nids: [ 10.100.4.126@tcp ]
       current_connection: 10.100.4.126@tcp
       connection_attempts: 1
       generation: 1
       in-progress_invalidations: 0
    rpcs:
       inflight: 0
       unregistering: 0
       timeouts: 0
       avg_waittime: 1702 usec
    service_estimates:
       services: 1 sec
       network: 1 sec
    transactions:
       last_replay: 0
       peer_committed: 0
       last_checked: 0
[root@eagle-44vm2 ~]# lctl get_param -n osc.lustre-OST0000-*.blocksize
4096

On the OSS:

[root@eagle-44vm1 lustre]# cat /proc/fs/lustre/osd-zfs/lustre-OST0000/blocksize 
131072

This seems to be the code that resets blocksize to 4K

int ofd_statfs(const struct lu_env *env,  struct obd_export *exp,
......
        if (obd->obd_self_export != exp && ofd_grant_compat(exp, ofd)) {
                /* clients which don't support OBD_CONNECT_GRANT_PARAM
                 * should not see a block size > page size, otherwise
                 * cl_lost_grant goes mad. Therefore, we emulate a 4KB (=2^12)
                 * block size which is the biggest block size known to work
                 * with all client's page size. */
                osfs->os_blocks <<= ofd->ofd_blockbits - COMPAT_BSIZE_SHIFT;
                osfs->os_bfree  <<= ofd->ofd_blockbits - COMPAT_BSIZE_SHIFT;
                osfs->os_bavail <<= ofd->ofd_blockbits - COMPAT_BSIZE_SHIFT;
                osfs->os_bsize    = 1 << COMPAT_BSIZE_SHIFT;
        }

I'm not familiar with the code, but if the code is correct then the test should be fixed or skipped for ZFS OSTs.

Attachments

Activity

[LU-6247] osd-zfs: sanity-sec test_16, test_22: 27919 != 26845 + 1M after write

Andreas Dilger added a comment - 25/Feb/15 1:39 AM

Definitely there is a need for fuzz in all space accoubting tests. There are Lustre LLOG records written, object index, and other files that are not freed when a file is deleted. As long as the test file is large enough that we can be certain the remaining space is fuzz and not the file failing to be deleted (I think the current test is ok in this regard) then the accounting doesn't need to be totally accurate.

In fact, some tests are having a fuzz much larger than a single block. It might make sense to have an "official fuzz" for each backend, or helper routines to check if a value is within the limits instead of reinventing this for every test that is checking space limits. Not sure if there is something in sanity-quota.sh in this regard already or not, but if yes then it probably makes sense to use them.

Andreas Dilger added a comment - 25/Feb/15 1:39 AM Definitely there is a need for fuzz in all space accoubting tests. There are Lustre LLOG records written, object index, and other files that are not freed when a file is deleted. As long as the test file is large enough that we can be certain the remaining space is fuzz and not the file failing to be deleted (I think the current test is ok in this regard) then the accounting doesn't need to be totally accurate. In fact, some tests are having a fuzz much larger than a single block. It might make sense to have an "official fuzz" for each backend, or helper routines to check if a value is within the limits instead of reinventing this for every test that is checking space limits. Not sure if there is something in sanity-quota.sh in this regard already or not, but if yes then it probably makes sense to use them.

Kit Westneat (Inactive) added a comment - 24/Feb/15 3:39 PM

It looks like there was an issue with quota reclaiming in the last test - the root user was unable to reclaim 2k (reclaimed 1022k vs the 1024k expected).

For ldiskfs, it seems like this might happen if the directory record needed to be expanded due to the number of deleted dentries. Or do directory records ever reclaim deleted dentries? Do you have any thoughts as to why this might happen for ZFS?

What do you think about having a 4k lower fuzz?

Kit Westneat (Inactive) added a comment - 24/Feb/15 3:39 PM It looks like there was an issue with quota reclaiming in the last test - the root user was unable to reclaim 2k (reclaimed 1022k vs the 1024k expected). For ldiskfs, it seems like this might happen if the directory record needed to be expanded due to the number of deleted dentries. Or do directory records ever reclaim deleted dentries? Do you have any thoughts as to why this might happen for ZFS? What do you think about having a 4k lower fuzz?

Gerrit Updater added a comment - 23/Feb/15 8:27 PM

Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/13844
Subject: ~~LU-6247~~ tests: fix nodemap quota test to use correct blocksize
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 525c3d8143acd5d1cd2612a858ba9f872792e0d4

Gerrit Updater added a comment - 23/Feb/15 8:27 PM Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/13844 Subject: LU-6247 tests: fix nodemap quota test to use correct blocksize Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 525c3d8143acd5d1cd2612a858ba9f872792e0d4

Andreas Dilger added a comment - 21/Feb/15 10:56 PM

The lower fuzz doesn't make sense for ldiskfs, though in theory it might make sense for ZFS if it has block compression enabled, but we don't do that now, so may as well remove it. I though I saw ldiskfs test failures when I was looking, but I can't check right now.

I didn't check yet what the ZFS OSTs return for the blocksize. It may be 128KB now, and up to 1MB in the future.

As for the /proc filename, the OSC one should be ok on the client,but it may be that this was removed by John H. recently? Definitely the osd one will work on the server.

Andreas Dilger added a comment - 21/Feb/15 10:56 PM The lower fuzz doesn't make sense for ldiskfs, though in theory it might make sense for ZFS if it has block compression enabled, but we don't do that now, so may as well remove it. I though I saw ldiskfs test failures when I was looking, but I can't check right now. I didn't check yet what the ZFS OSTs return for the blocksize. It may be 128KB now, and up to 1MB in the future. As for the /proc filename, the OSC one should be ok on the client,but it may be that this was removed by John H. recently? Definitely the osd one will work on the server.

Kit Westneat (Inactive) added a comment - 20/Feb/15 10:50 PM - edited

I looked on maloo for test failures on ldiskfs with this cause, but I couldn't find any. It looks like all the sanity-sec failures on ldiskfs are ~~LU-5423~~ or ~~LU-6106~~. Do you think it's worth trying to just modify the fuzz on ZFS? I think on one of the OpenSFS calls that's what we decided to do, but it looks like I used the wrong blocksize parameter (osc.$FSNAME-OST0000-*.blocksize vs osd-*.$FSNAME-OST0000.blocksize).

I can invert the check if you prefer it that way. The fuzz parameters were actually added to account for the indirect blocks and overhead. I have it fuzz the low and high, but I suppose it really only makes sense to fuzz if the quota is higher than expected. If the quota used is lower than expected, it can't really be accounted for as filesystem overhead, right? If that's the case, I can just remove the qused_low parameter, as it seems like it's confusing.

Kit Westneat (Inactive) added a comment - 20/Feb/15 10:50 PM - edited I looked on maloo for test failures on ldiskfs with this cause, but I couldn't find any. It looks like all the sanity-sec failures on ldiskfs are LU-5423 or LU-6106 . Do you think it's worth trying to just modify the fuzz on ZFS? I think on one of the OpenSFS calls that's what we decided to do, but it looks like I used the wrong blocksize parameter (osc.$FSNAME-OST0000-*.blocksize vs osd-*.$FSNAME-OST0000.blocksize). I can invert the check if you prefer it that way. The fuzz parameters were actually added to account for the indirect blocks and overhead. I have it fuzz the low and high, but I suppose it really only makes sense to fuzz if the quota is higher than expected. If the quota used is lower than expected, it can't really be accounted for as filesystem overhead, right? If that's the case, I can just remove the qused_low parameter, as it seems like it's confusing.

Andreas Dilger added a comment - 17/Feb/15 8:02 PM

It looks like test_22 a new test added for nodemap in 2.7, and this has been failing intermittently in testing on both ldiskfs and zfs. The test is checking that writing 1MB of data causes exactly 1MB of increase in quota usage for the specified (mapped) UID.

It appears that this is causing 1074 * 1024 bytes of quota to be consumed in this case, 50KB more than expected (between 32-70KB in other failures), almost certainly due to indirect blocks. It seems that the test is trying to give some margin for error, but I don't think it is quite right, or large enough:

do_fops_quota_test() {
        # define fuzz as 2x ost block size in K
        local quota_fuzz=$(($(lctl get_param -n \
                osc.$FSNAME-OST0000-*.blocksize | head -1) / 512))
        local qused_orig=$(nodemap_check_quota "$run_u")
        local qused_low=$((qused_orig - quota_fuzz))
        local qused_high=$((qused_orig + quota_fuzz))
        $run_u dd if=/dev/zero of=$testfile bs=1M count=1 >& /dev/null
        sync; sync_all_data || true

        local qused_new=$(nodemap_check_quota "$run_u")
        [ $((qused_low + 1024)) -le $((qused_new)) \
                -a $((qused_high + 1024)) -ge $((qused_new)) ] ||
                error "$qused_new != $qused_orig + 1M after write"

I think the check as written is a bit confusing (i.e. inverted logic from what I'd expect), and should also consider indirect blocks and other minor overhead:

        # allow 128KB margin for indirect blocks and other overhead
        [ $qused_new -lt $((qused_low + 1024))  \
                -o $qused_new -gt $((qused_high + 1024 + 128)) ] &&
                error "$qused_new != $qused_orig + 1M after write"

The other related problem is that once this test fails, it doesn't clean up after itself properly, and the following tests up to test_23 will also fail with "adding fops nodemaps failed 1". The tests should add a trap for a cleanup helper so that the later tests can run properly.

Andreas Dilger added a comment - 17/Feb/15 8:02 PM It looks like test_22 a new test added for nodemap in 2.7, and this has been failing intermittently in testing on both ldiskfs and zfs. The test is checking that writing 1MB of data causes exactly 1MB of increase in quota usage for the specified (mapped) UID. It appears that this is causing 1074 * 1024 bytes of quota to be consumed in this case, 50KB more than expected (between 32-70KB in other failures), almost certainly due to indirect blocks. It seems that the test is trying to give some margin for error, but I don't think it is quite right, or large enough: do_fops_quota_test() { # define fuzz as 2x ost block size in K local quota_fuzz=$(($(lctl get_param -n \ osc.$FSNAME-OST0000-*.blocksize | head -1) / 512)) local qused_orig=$(nodemap_check_quota "$run_u") local qused_low=$((qused_orig - quota_fuzz)) local qused_high=$((qused_orig + quota_fuzz)) $run_u dd if=/dev/zero of=$testfile bs=1M count=1 >& /dev/null sync; sync_all_data || true local qused_new=$(nodemap_check_quota "$run_u") [ $((qused_low + 1024)) -le $((qused_new)) \ -a $((qused_high + 1024)) -ge $((qused_new)) ] || error "$qused_new != $qused_orig + 1M after write" I think the check as written is a bit confusing (i.e. inverted logic from what I'd expect), and should also consider indirect blocks and other minor overhead: # allow 128KB margin for indirect blocks and other overhead [ $qused_new -lt $((qused_low + 1024)) \ -o $qused_new -gt $((qused_high + 1024 + 128)) ] && error "$qused_new != $qused_orig + 1M after write" The other related problem is that once this test fails, it doesn't clean up after itself properly, and the following tests up to test_23 will also fail with "adding fops nodemaps failed 1". The tests should add a trap for a cleanup helper so that the later tests can run properly.

People

Assignee:: Nathaniel Clark

Reporter:: Maloo

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 14/Feb/15 1:19 AM

Updated:: 18/Aug/15 2:14 PM

Resolved:: 30/Mar/15 12:42 PM