[LU-11670] Incorrect size when using lockahead Created: 15/Nov/18 Updated: 26/Oct/20 Resolved: 04/Oct/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Patrick Farrell (Inactive) | Assignee: | Patrick Farrell (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
When running IOR with the patched MPICH to use lockahead, we pretty consistently get size miscompares on some systems, eg:
I've identified the bug and found a fix. During development, Oleg pointed out that it was not necessary to send glimpse callbacks to all locks on a particular resource, but rather we could send one per client, because the client size check is not lock specific - It actually gets the size from the upper layers. This is true, but there is a caveat that went unnoticed: This means that if one client requests, say, two 1 MiB locks, then writes in to the first of them, and another client stats the file, the server will only send a glimpse to the highest lock.* This higher lock has not been used for I/O and therefore does not have l_ast_data set, so the part of the glimpse callback that gets size from clio layers does not run. So the second client will see a file size of zero. *Note that if we wait long enough, the write associated with the first lock will be flushed and the server will have up to date size and will return the correct value to the client. Part of the reason this is timing dependent. The fix is for the client, in the glimpse AST, to walk the granted lock list looking for a lock with l_ast_data set. If none is found, then either no writes actually used these locks, or the object is being destroyed - either way, this client doesn't have useful size information.
Patch forthcoming momentarily.
I'll leave this up to WC whether or not this should be a blocker, but it's probably worth considering as one. |
| Comments |
| Comment by Gerrit Updater [ 15/Nov/18 ] |
|
Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/33660 |
| Comment by Gerrit Updater [ 20/Sep/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33660/ |
| Comment by Peter Jones [ 20/Sep/19 ] |
|
Landed for 2.13 |
| Comment by Andreas Dilger [ 25/Sep/19 ] |
|
The new sanityn test_103 is causing intermittent test failures on master since this patch has landed: |
| Comment by Peter Jones [ 25/Sep/19 ] |
|
paf any thoughts? |
| Comment by Andreas Dilger [ 25/Sep/19 ] |
Incorrect size expected (no glimpse fix): Starting test test23 at 1569352103 Finishing test test23 at 1569352106 sanityn test_103: @@@@@@ FAIL: Lockahead test 23 passed with fail_loc set, 0 |
| Comment by Patrick Farrell [ 25/Sep/19 ] |
|
Oh, yuck. I checked a few of those, and they all appear to be ZFS. So presumably it's some sort of very long sync delay, or possibly the ZFS periodic commit interval is triggering in the middle of the test... Can't quite nail what it is right now. But there's an easy way out.
This test is a little weird, as it has two halves so it can sort of self-verify... The first half proves the problem exists, by showing that if you disable the fix, the bug occurs. The second half tests that the problem does not occur with the fix in place. The first half, the one that shows the bug exists, is the only one that's failing. Basically, we are unexpectedly succeeding sometimes. And while I like having a self-verifying test, the truth is very few of our tests are that way. So a quick & simple solution that would not overly compromise things would be to remove everything after test_mkdir (line 4753 here https://review.whamcloud.com/#/c/33660/31/lustre/tests/sanityn.sh ) up to (but not including) line 4780, where we start setting the fail_loc. (I'm not going to push this patch at the moment, but I'd be happy to review it.) That would leave you with only the "positive" test, removing the "negative" (ie, prove it's broken without the fix) one, which is the only one that's failing. If someone really wants to debug this, I would add a printf with the stat information in to lockahead test.c, rather than only printing stat info when it disagrees... See whether the "agreeing" size on both clients is 0 or 1 MiB. And, basically, dig through the ZFS stuff and see how the different ZFS sync behavior is conflicting with the test. |
| Comment by Jian Yu [ 26/Sep/19 ] |
|
I'm working on the patch to remove the first part of the test from sanityn test 103. |
| Comment by Jian Yu [ 26/Sep/19 ] |
|
Hi Patrick, sanityn.sh
diff --git a/lustre/tests/sanityn.sh b/lustre/tests/sanityn.sh
index 7d4007534b..ae47c09d55 100755
--- a/lustre/tests/sanityn.sh
+++ b/lustre/tests/sanityn.sh
@@ -4768,7 +4768,7 @@ test_103() {
lockahead_test -d $DIR/$tdir -D $DIR2/$tdir -t $testnum -f $tfile
rc=$?
if [ $rc -eq 0 ]; then
- error "Lockahead test $testnum passed with fail_loc set, ${rc}"
+ echo "Lockahead test $testnum passed with fail_loc set, ${rc}"
fi
# guarantee write commit timeout has expired
|
| Comment by Patrick Farrell [ 26/Sep/19 ] |
|
Hi Jian, Ah, that's better than my suggestion. |
| Comment by Gerrit Updater [ 26/Sep/19 ] |
|
Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36303 |
| Comment by Andreas Dilger [ 01/Oct/19 ] |
|
NB: there also appear to be about 40% of failures on ldiskfs, and not just ZFS. |
| Comment by Gerrit Updater [ 04/Oct/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36303/ |
| Comment by Peter Jones [ 04/Oct/19 ] |
|
ok - the correction to the patch has landed too |
| Comment by Gerrit Updater [ 08/Oct/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36406 |
| Comment by Gerrit Updater [ 08/Oct/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36407 |
| Comment by Gerrit Updater [ 12/Dec/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36406/ |
| Comment by Gerrit Updater [ 12/Dec/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36407/ |