Details

    • 3
    • 9223372036854775807

    Description

      When running IOR with the patched MPICH to use lockahead, we pretty consistently get size miscompares on some systems, eg:
      WARNING: inconsistent file size by different tasks.
      WARNING: Expected aggregate file size       = 5368709120.
      WARNING: Stat() of aggregate file size      = 5103419392.
      WARNING: Using actual aggregate bytes moved = 5368709120.
      This seems to be pretty timing dependent, as we don't see it at all on another system running the same software, and we didn't see it in our original testing, even though the bug is definitely present.

       

      I've identified the bug and found a fix.  During development, Oleg pointed out that it was not necessary to send glimpse callbacks to all locks on a particular resource, but rather we could send one per client, because the client size check is not lock specific - It actually gets the size from the upper layers.

      This is true, but there is a caveat that went unnoticed:
      This only happens if l_ast_data is set (see osc_ldlm_glimpse_ast), which is not true for speculatively requested locks (glimpse, lockahead - see osc_lock_enqueue & osc_enqueue_base) until they are used for I/O.

      This means that if one client requests, say, two 1 MiB locks, then writes in to the first of them, and another client stats the file, the server will only send a glimpse to the highest lock.*

      This higher lock has not been used for I/O and therefore does not have l_ast_data set, so the part of the glimpse callback that gets size from clio layers does not run.  So the second client will see a file size of zero.

      *Note that if we wait long enough, the write associated with the first lock will be flushed and the server will have up to date size and will return the correct value to the client.  Part of the reason this is timing dependent.

      The fix is for the client, in the glimpse AST, to walk the granted lock list looking for a lock with l_ast_data set.  If none is found, then either no writes actually used these locks, or the object is being destroyed - either way, this client doesn't have useful size information.

       

      Patch forthcoming momentarily.

       

      I'll leave this up to WC whether or not this should be a blocker, but it's probably worth considering as one.

      Attachments

        Issue Links

          Activity

            [LU-11670] Incorrect size when using lockahead

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36407/
            Subject: LU-11670 tests: do not fail the first half in sanityn test 103
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 9292c2a0d1159d3ab47a190799f73e1e1a09f77c

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36407/ Subject: LU-11670 tests: do not fail the first half in sanityn test 103 Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 9292c2a0d1159d3ab47a190799f73e1e1a09f77c

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36406/
            Subject: LU-11670 osc: glimpse - search for active lock
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 2548cb9e32bfca897de577f88836629f72641369

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36406/ Subject: LU-11670 osc: glimpse - search for active lock Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 2548cb9e32bfca897de577f88836629f72641369

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36407
            Subject: LU-11670 tests: do not fail the first half in sanityn test 103
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: ef91b632b93e8ea57b4fe41ee2e9b1306d48fb8f

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36407 Subject: LU-11670 tests: do not fail the first half in sanityn test 103 Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: ef91b632b93e8ea57b4fe41ee2e9b1306d48fb8f

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36406
            Subject: LU-11670 osc: glimpse - search for active lock
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 7df7f1d5b2bf01643cdd55d6ed9fccaaf98a2c42

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36406 Subject: LU-11670 osc: glimpse - search for active lock Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 7df7f1d5b2bf01643cdd55d6ed9fccaaf98a2c42
            pjones Peter Jones added a comment -

            ok - the correction to the patch has landed too

            pjones Peter Jones added a comment - ok - the correction to the patch has landed too

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36303/
            Subject: LU-11670 tests: do not fail the first half in sanityn test 103
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a88d0aa76c62e3074a05e869c9f0ba7ac128300f

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36303/ Subject: LU-11670 tests: do not fail the first half in sanityn test 103 Project: fs/lustre-release Branch: master Current Patch Set: Commit: a88d0aa76c62e3074a05e869c9f0ba7ac128300f

            NB: there also appear to be about 40% of failures on ldiskfs, and not just ZFS.

            adilger Andreas Dilger added a comment - NB: there also appear to be about 40% of failures on ldiskfs, and not just ZFS.

            Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36303
            Subject: LU-11670 tests: do not fail the first half in sanityn test 103
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 5c382fc06b013044e5bc16c20d648ac467e6aa5e

            gerrit Gerrit Updater added a comment - Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36303 Subject: LU-11670 tests: do not fail the first half in sanityn test 103 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 5c382fc06b013044e5bc16c20d648ac467e6aa5e

            Hi Jian,

            Ah, that's better than my suggestion.   Yeah, I like that idea.

            paf0186 Patrick Farrell added a comment - Hi Jian, Ah, that's better than my suggestion.   Yeah, I like that idea.
            yujian Jian Yu added a comment -

            Hi Patrick,
            What about just making the following change so as to keep the first half test and just print an info when the problem cannot be reproduced?

            sanityn.sh
            diff --git a/lustre/tests/sanityn.sh b/lustre/tests/sanityn.sh
            index 7d4007534b..ae47c09d55 100755
            --- a/lustre/tests/sanityn.sh
            +++ b/lustre/tests/sanityn.sh
            @@ -4768,7 +4768,7 @@ test_103() {
                    lockahead_test -d $DIR/$tdir -D $DIR2/$tdir -t $testnum -f $tfile
                    rc=$?
                    if [ $rc -eq 0 ]; then
            -               error "Lockahead test $testnum passed with fail_loc set, ${rc}"
            +               echo "Lockahead test $testnum passed with fail_loc set, ${rc}"
                    fi
             
                    # guarantee write commit timeout has expired
            
            yujian Jian Yu added a comment - Hi Patrick, What about just making the following change so as to keep the first half test and just print an info when the problem cannot be reproduced? sanityn.sh diff --git a/lustre/tests/sanityn.sh b/lustre/tests/sanityn.sh index 7d4007534b..ae47c09d55 100755 --- a/lustre/tests/sanityn.sh +++ b/lustre/tests/sanityn.sh @@ -4768,7 +4768,7 @@ test_103() { lockahead_test -d $DIR/$tdir -D $DIR2/$tdir -t $testnum -f $tfile rc=$? if [ $rc -eq 0 ]; then - error "Lockahead test $testnum passed with fail_loc set, ${rc}" + echo "Lockahead test $testnum passed with fail_loc set, ${rc}" fi # guarantee write commit timeout has expired

            People

              pfarrell Patrick Farrell (Inactive)
              paf Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: