Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12232

replay-ost-single test 6 fails with ''space grew after dd: before:13442048 after_dd:13442048''

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • Lustre 2.13.0, Lustre 2.12.1
    • 3
    • 9223372036854775807

    Description

      replay-ost-single test 6 fails with ''space grew after dd: before:X after_dd:Y” for some values of X and Y.

      Looking at the suite_log for a recent failure, logs at https://testing.whamcloud.com/test_sets/d53ee48a-665d-11e9-8bb1-52540065bddc , we see

      CMD: trevis-45vm9 lctl set_param fail_loc=0x80000119
      fail_loc=0x80000119
      before: 13442048 after_dd: 13442048 took 20 seconds
       replay-ost-single test_6: @@@@@@ FAIL: space grew after dd: before:13442048 after_dd:13442048 
      

      Some of the failure have the before and after values the same and some failures have different values for before and after.

      There are no errors in any of the node console logs.

      This failure looks like LU-4265. I’ve opened a new ticket because this test has failed with this error message six times in the past year and a half. Of those six failures, four have been seen this month. Thus, maybe something landed recently that is increasing the frequency of this failure or there a different/new cause.

      There are several examples of this failure, but here are just a couple of additional links to logs
      https://testing.whamcloud.com/test_sets/bf51ebce-65f7-11e9-a6f9-52540065bddc
      https://testing.whamcloud.com/test_sets/043e5b44-62fd-11e9-aeec-52540065bddc
      https://testing.whamcloud.com/test_sets/4fd6a78e-6170-11e9-9720-52540065bddc

      Attachments

        Issue Links

          Activity

            [LU-12232] replay-ost-single test 6 fails with ''space grew after dd: before:13442048 after_dd:13442048''

            By searching the fails on Maloo, the new occurrences began at Sept 03, and all are with ZFS backend.
            the ZFS version is 0.7.13, 0.8.1,

            hongchao.zhang Hongchao Zhang added a comment - By searching the fails on Maloo, the new occurrences began at Sept 03, and all are with ZFS backend. the ZFS version is 0.7.13, 0.8.1,

            It looks like we are still experiencing replay-ost-single test 6 failing with the modified error message 'free grew after dd: before:15371264 after_dd:15371264'' . Please see https://testing.whamcloud.com/test_sets/6f49e8d2-07de-11ea-8e77-52540065bddc for one recent failure on b2_13.

            jamesanunez James Nunez (Inactive) added a comment - It looks like we are still experiencing replay-ost-single test 6 failing with the modified error message 'free grew after dd: before:15371264 after_dd:15371264'' . Please see https://testing.whamcloud.com/test_sets/6f49e8d2-07de-11ea-8e77-52540065bddc for one recent failure on b2_13.

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34927/
            Subject: LU-12232 test: commit before df
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 989217db39b2832c48ae58503363a4939c115d5a

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34927/ Subject: LU-12232 test: commit before df Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 989217db39b2832c48ae58503363a4939c115d5a

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34927
            Subject: LU-12232 test: commit before df
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 584c4bb7dd05a6102bc1e567db03223b1b835bcf

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34927 Subject: LU-12232 test: commit before df Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 584c4bb7dd05a6102bc1e567db03223b1b835bcf
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34808/
            Subject: LU-12232 test: commit before df
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f1cbfb96c820aa7e1e5a84176619679d696a117a

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34808/ Subject: LU-12232 test: commit before df Project: fs/lustre-release Branch: master Current Patch Set: Commit: f1cbfb96c820aa7e1e5a84176619679d696a117a

            Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34808
            Subject: LU-12232 test: commit before df
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: fbefd48a492768c1c877bf11a164e3fddb2e67f9

            gerrit Gerrit Updater added a comment - Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34808 Subject: LU-12232 test: commit before df Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: fbefd48a492768c1c877bf11a164e3fddb2e67f9

            this issue is caused by the side effect of previous test, the previous transactions are not committed
            yet when getting the "free disk space" before "dd".

            hongchao.zhang Hongchao Zhang added a comment - this issue is caused by the side effect of previous test, the previous transactions are not committed yet when getting the "free disk space" before "dd".

            Here's the failure check:

                    log "before: $before after_dd: $after_dd took $i seconds"
                    (( $before > $after_dd )) ||
                            error "space grew after dd: before:$before after_dd:$after_dd" 

            It would be nice to rewrite this a bit when we fix it - These are actually checks on free space.  This is verifying that free space didn't grow.  It would be nice if the test made that clearer.

             

            pfarrell Patrick Farrell (Inactive) added a comment - Here's the failure check: log "before: $before after_dd: $after_dd took $i seconds" (( $before > $after_dd )) || error "space grew after dd: before:$before after_dd:$after_dd" It would be nice to rewrite this a bit when we fix it - These are actually checks on free space .  This is verifying that free space didn't grow.  It would be nice if the test made that clearer.  
            pjones Peter Jones added a comment -

            Hongchao

            Could you please investigate?

            Peter

            pjones Peter Jones added a comment - Hongchao Could you please investigate? Peter

            People

              hongchao.zhang Hongchao Zhang
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: