[LU-6247] osd-zfs: sanity-sec test_16, test_22: 27919 != 26845 + 1M after write Created: 14/Feb/15 Updated: 18/Aug/15 Resolved: 30/Mar/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Nathaniel Clark |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 17497 | ||||
| Description |
|
This issue was created by maloo for Isaac Huang <he.huang@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/f61d2cb6-b2a2-11e4-a8f5-5254006e85c2. The sub-test test_22 failed with the following error: 27919 != 26845 + 1M after write It seemed that the test was using 4096 as blocksize for the ZFS OST to estimate quota, which would fail. On the client: [root@eagle-44vm2 ~]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff88007a5bb800/import
import:
name: lustre-OST0000-osc-ffff88007a5bb800
target: lustre-OST0000_UUID
state: FULL
connect_flags: [ write_grant, server_lock, version, request_portal, truncate_lock, max_byte_per_rpc, early_lock_cancel, adaptive_timeouts, lru_resize, alt_checksum_algorithm, fid_is_enabled, version_recovery, full20, layout_lock, 64bithash, object_max_bytes, jobstats, einprogress, lvb_type, lfsck ]
connect_data:
flags: 0x404af0e3440478
instance: 1
target_version: 2.6.94.0
initial_grant: 1048576
max_brw_size: 4194304
grant_block_size: 0
grant_inode_size: 0
grant_extent_overhead: 0
cksum_types: 0x2
max_easize: 32768
max_object_bytes: 9223372036854775807
import_flags: [ replayable, pingable, connect_tried ]
connection:
failover_nids: [ 10.100.4.126@tcp ]
current_connection: 10.100.4.126@tcp
connection_attempts: 1
generation: 1
in-progress_invalidations: 0
rpcs:
inflight: 0
unregistering: 0
timeouts: 0
avg_waittime: 1702 usec
service_estimates:
services: 1 sec
network: 1 sec
transactions:
last_replay: 0
peer_committed: 0
last_checked: 0
[root@eagle-44vm2 ~]# lctl get_param -n osc.lustre-OST0000-*.blocksize
4096
On the OSS: [root@eagle-44vm1 lustre]# cat /proc/fs/lustre/osd-zfs/lustre-OST0000/blocksize 131072 This seems to be the code that resets blocksize to 4K int ofd_statfs(const struct lu_env *env, struct obd_export *exp,
......
if (obd->obd_self_export != exp && ofd_grant_compat(exp, ofd)) {
/* clients which don't support OBD_CONNECT_GRANT_PARAM
* should not see a block size > page size, otherwise
* cl_lost_grant goes mad. Therefore, we emulate a 4KB (=2^12)
* block size which is the biggest block size known to work
* with all client's page size. */
osfs->os_blocks <<= ofd->ofd_blockbits - COMPAT_BSIZE_SHIFT;
osfs->os_bfree <<= ofd->ofd_blockbits - COMPAT_BSIZE_SHIFT;
osfs->os_bavail <<= ofd->ofd_blockbits - COMPAT_BSIZE_SHIFT;
osfs->os_bsize = 1 << COMPAT_BSIZE_SHIFT;
}
I'm not familiar with the code, but if the code is correct then the test should be fixed or skipped for ZFS OSTs. |
| Comments |
| Comment by Andreas Dilger [ 17/Feb/15 ] |
|
It looks like test_22 a new test added for nodemap in 2.7, and this has been failing intermittently in testing on both ldiskfs and zfs. The test is checking that writing 1MB of data causes exactly 1MB of increase in quota usage for the specified (mapped) UID. It appears that this is causing 1074 * 1024 bytes of quota to be consumed in this case, 50KB more than expected (between 32-70KB in other failures), almost certainly due to indirect blocks. It seems that the test is trying to give some margin for error, but I don't think it is quite right, or large enough: do_fops_quota_test() {
# define fuzz as 2x ost block size in K
local quota_fuzz=$(($(lctl get_param -n \
osc.$FSNAME-OST0000-*.blocksize | head -1) / 512))
local qused_orig=$(nodemap_check_quota "$run_u")
local qused_low=$((qused_orig - quota_fuzz))
local qused_high=$((qused_orig + quota_fuzz))
$run_u dd if=/dev/zero of=$testfile bs=1M count=1 >& /dev/null
sync; sync_all_data || true
local qused_new=$(nodemap_check_quota "$run_u")
[ $((qused_low + 1024)) -le $((qused_new)) \
-a $((qused_high + 1024)) -ge $((qused_new)) ] ||
error "$qused_new != $qused_orig + 1M after write"
I think the check as written is a bit confusing (i.e. inverted logic from what I'd expect), and should also consider indirect blocks and other minor overhead: # allow 128KB margin for indirect blocks and other overhead
[ $qused_new -lt $((qused_low + 1024)) \
-o $qused_new -gt $((qused_high + 1024 + 128)) ] &&
error "$qused_new != $qused_orig + 1M after write"
The other related problem is that once this test fails, it doesn't clean up after itself properly, and the following tests up to test_23 will also fail with "adding fops nodemaps failed 1". The tests should add a trap for a cleanup helper so that the later tests can run properly. |
| Comment by Kit Westneat [ 20/Feb/15 ] |
|
I looked on maloo for test failures on ldiskfs with this cause, but I couldn't find any. It looks like all the sanity-sec failures on ldiskfs are I can invert the check if you prefer it that way. The fuzz parameters were actually added to account for the indirect blocks and overhead. I have it fuzz the low and high, but I suppose it really only makes sense to fuzz if the quota is higher than expected. If the quota used is lower than expected, it can't really be accounted for as filesystem overhead, right? If that's the case, I can just remove the qused_low parameter, as it seems like it's confusing. |
| Comment by Andreas Dilger [ 21/Feb/15 ] |
|
The lower fuzz doesn't make sense for ldiskfs, though in theory it might make sense for ZFS if it has block compression enabled, but we don't do that now, so may as well remove it. I though I saw ldiskfs test failures when I was looking, but I can't check right now. I didn't check yet what the ZFS OSTs return for the blocksize. It may be 128KB now, and up to 1MB in the future. As for the /proc filename, the OSC one should be ok on the client,but it may be that this was removed by John H. recently? Definitely the osd one will work on the server. |
| Comment by Gerrit Updater [ 23/Feb/15 ] |
|
Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/13844 |
| Comment by Kit Westneat [ 24/Feb/15 ] |
|
It looks like there was an issue with quota reclaiming in the last test - the root user was unable to reclaim 2k (reclaimed 1022k vs the 1024k expected). For ldiskfs, it seems like this might happen if the directory record needed to be expanded due to the number of deleted dentries. Or do directory records ever reclaim deleted dentries? Do you have any thoughts as to why this might happen for ZFS? What do you think about having a 4k lower fuzz? |
| Comment by Andreas Dilger [ 25/Feb/15 ] |
|
Definitely there is a need for fuzz in all space accoubting tests. There are Lustre LLOG records written, object index, and other files that are not freed when a file is deleted. As long as the test file is large enough that we can be certain the remaining space is fuzz and not the file failing to be deleted (I think the current test is ok in this regard) then the accounting doesn't need to be totally accurate. In fact, some tests are having a fuzz much larger than a single block. It might make sense to have an "official fuzz" for each backend, or helper routines to check if a value is within the limits instead of reinventing this for every test that is checking space limits. Not sure if there is something in sanity-quota.sh in this regard already or not, but if yes then it probably makes sense to use them. |
| Comment by Isaac Huang (Inactive) [ 25/Feb/15 ] |
|
There's a fs_log_size() in test-framework.sh. |
| Comment by Gerrit Updater [ 27/Mar/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13844/ |
| Comment by Nathaniel Clark [ 30/Mar/15 ] |
|
Patch landed to master |