Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18366

interop: sanity-flr test_0g: FAIL: error resync-ing file '/mnt/lustre/f0g.sanity-flr'

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • None
    • Lustre 2.16.0
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for jianyu <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/0be154b6-6f0c-43fd-bf39-98965995eb91

      test_0g failed with the following error:

      lfs mirror mirror: fail to pread 0-4096 of mirror 1: Invalid argument (22)
      lfs mirror mirror: error reading bytes 0-4096 of mirror 1: Invalid argument (22)
      lfs mirror mirror: fail to mirror resync '/mnt/lustre/f0g.sanity-flr': Invalid argument (22)
       sanity-flr test_0g: @@@@@@ FAIL: error resync-ing file '/mnt/lustre/f0g.sanity-flr'
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-master/4584 - 4.18.0-553.16.1.el8_10.x86_64
      servers: https://build.whamcloud.com/job/lustre-b2_15/94 - 4.18.0-553.5.1.el8_lustre.x86_64

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-flr test_0g - error resync-ing file '/mnt/lustre/f0g.sanity-flr'

      Attachments

        Issue Links

          Activity

            [LU-18366] interop: sanity-flr test_0g: FAIL: error resync-ing file '/mnt/lustre/f0g.sanity-flr'
            yujian Jian Yu added a comment -

            The above sanity-flr subtests passed in all of the Lustre 2.16.0 RC5 clients with 2.15.5 servers full-part-1 test sessions.

            yujian Jian Yu added a comment - The above sanity-flr subtests passed in all of the Lustre 2.16.0 RC5 clients with 2.15.5 servers full-part-1 test sessions.
            gerrit Gerrit Updater added a comment - - edited

            "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56782
            Subject: LU-18366 llite: Revert "LU-18284 llite: disallow udio exceptions"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 6e0b76d7c8a3df177f46bf986c720a442c3db4b4

            gerrit Gerrit Updater added a comment - - edited "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56782 Subject: LU-18366 llite: Revert " LU-18284 llite: disallow udio exceptions" Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6e0b76d7c8a3df177f46bf986c720a442c3db4b4
            pjones Peter Jones added a comment -

            Yes please - push the revert and we can assess options!

            pjones Peter Jones added a comment - Yes please - push the revert and we can assess options!
            yujian Jian Yu added a comment -

            Hi adilger,
            Should I push a patch to revert commit ff018bb77a37 (LU-18284 llite: disallow udio exceptions) from master branch or wait for a fix?

            yujian Jian Yu added a comment - Hi adilger , Should I push a patch to revert commit ff018bb77a37 ( LU-18284 llite: disallow udio exceptions) from master branch or wait for a fix?
            yujian Jian Yu added a comment - - edited

            After reverting commit ff018bb77a37 (LU-18284 llite: disallow udio exceptions), except sanity-flr test 50b and 50d, other sanity-flr and sanity-sec regression failures disappeared:
            https://testing.whamcloud.com/test_sets/f7c4ae6a-1f2b-4007-b93d-c36d0eb65e44
            https://testing.whamcloud.com/test_sets/9c4b2f04-1f5c-444a-a99e-26cd2b78e391
            sanity tests also passed:
            https://testing.whamcloud.com/test_sets/8a856ba2-26b6-4339-ba14-488bb625ebea

            yujian Jian Yu added a comment - - edited After reverting commit ff018bb77a37 ( LU-18284 llite: disallow udio exceptions), except sanity-flr test 50b and 50d, other sanity-flr and sanity-sec regression failures disappeared: https://testing.whamcloud.com/test_sets/f7c4ae6a-1f2b-4007-b93d-c36d0eb65e44 https://testing.whamcloud.com/test_sets/9c4b2f04-1f5c-444a-a99e-26cd2b78e391 sanity tests also passed: https://testing.whamcloud.com/test_sets/8a856ba2-26b6-4339-ba14-488bb625ebea
            yujian Jian Yu added a comment -

            Lustre 2.16.0 RC1 client + 2.15.5 server:
            https://testing.whamcloud.com/test_sets/a6ff7b58-8ec6-41cc-b90e-9b19a56cbc1c

            sanity-flr: FAIL: test_50a data mismatch: \'51fe65470e46efc4d0e76202ab73737b  -\' vs. \'d41d8cd98f00b204e9800998ecf8427e  -\'
            sanity-flr: FAIL: test_50b data mismatch: \'84efddfa5f00dd4ec046a8dd30d2f17d  -\' vs. \'d41d8cd98f00b204e9800998ecf8427e  -\'
            sanity-flr: FAIL: test_50d data mismatch: \'8a9803d9e4b43b025672e518c2de48c4  -\' vs. \'d41d8cd98f00b204e9800998ecf8427e  -\'
            

            The above failures were reported in LU-18319, which was fixed in RC2.

            Lustre 2.16.0 RC2 client + 2.15.5 server:
            https://testing.whamcloud.com/test_sets/0be154b6-6f0c-43fd-bf39-98965995eb91

            sanity-flr: FAIL: test_0g error resync-ing file '/mnt/lustre/f0g.sanity-flr'
            sanity-flr: FAIL: test_0h error resync-ing file '/mnt/lustre/d0h.sanity-flr/f0h.sanity-flr'
            sanity-flr: FAIL: test_0j resync /mnt/lustre/f0j.sanity-flr failed
            sanity-flr: FAIL: test_37 1: mismatch: \'ec71c760694ca69c598ae00189683df8  -\' vs. \'9ef5b1d2eb35e01fc7ce02893d77ff4a  -\'
            sanity-flr: FAIL: test_38 valid mirror doesn't exist
            sanity-flr: FAIL: test_42 resync /mnt/lustre/d42.sanity-flr/f42.sanity-flr-1 failed
            sanity-flr: FAIL: test_50a data mismatch: \'e7785df2c515c75dd9d743d8adea5918  -\' vs. \'d41d8cd98f00b204e9800998ecf8427e  -\'
            sanity-flr: FAIL: test_50b data mismatch: \'267cb9cb0da2c81ebefa760cd599c5d0  -\' vs. \'d41d8cd98f00b204e9800998ecf8427e  -\'
            sanity-flr: FAIL: test_50d data mismatch: \'bcdb34bd846d5427496c6bc82f219a33  -\' vs. \'d41d8cd98f00b204e9800998ecf8427e  -\'
            sanity-flr: FAIL: test_61a cannot migrate /mnt/lustre/d61a.sanity-flr/f61a.sanity-flr
            sanity-flr: FAIL: test_61c cannot resync mirror /mnt/lustre/d61c.sanity-flr/f61c.sanity-flr
            sanity-flr: FAIL: test_70a fsx FLR file /mnt/lustre/d70a.sanity-flr/f70a.sanity-flr failed
            sanity-flr: FAIL: test_200a final resync failed
            sanity-flr: FAIL: test_200b final resync failed
            
            $ git log --oneline 2.16.0-RC1..2.16.0-RC2 | grep -v 'tests:'
            69a079d51f93 New RC 2.16.0-RC2
            659bb1d70431 LU-18070 sec: clear ACL caches if ACL empty
            209607fd7957 LU-18096 enc: ll_get_symlink overlay function
            2a5e8e355498 LU-18247 nodemap: initialize unused fields on disk
            6fe522d3d4f9 LU-17906 pltrpc: don't use non-uptodate peer at connect
            ff018bb77a37 LU-18284 llite: disallow udio exceptions
            13fd5ebef3a7 LU-18101 sec: fix ACL handling on recent kernels again
            66d93ce3e4fc LU-17251 test: improve parallel-scale rr_alloc test
            cf2c5fe27e90 LU-4315 doc: remove usage of lgroff-macros
            

            Let me revert the patch for LU-18284 to see if the issues will be resolved.

            yujian Jian Yu added a comment - Lustre 2.16.0 RC1 client + 2.15.5 server: https://testing.whamcloud.com/test_sets/a6ff7b58-8ec6-41cc-b90e-9b19a56cbc1c sanity-flr: FAIL: test_50a data mismatch: \'51fe65470e46efc4d0e76202ab73737b -\' vs. \'d41d8cd98f00b204e9800998ecf8427e -\' sanity-flr: FAIL: test_50b data mismatch: \'84efddfa5f00dd4ec046a8dd30d2f17d -\' vs. \'d41d8cd98f00b204e9800998ecf8427e -\' sanity-flr: FAIL: test_50d data mismatch: \'8a9803d9e4b43b025672e518c2de48c4 -\' vs. \'d41d8cd98f00b204e9800998ecf8427e -\' The above failures were reported in LU-18319 , which was fixed in RC2. Lustre 2.16.0 RC2 client + 2.15.5 server: https://testing.whamcloud.com/test_sets/0be154b6-6f0c-43fd-bf39-98965995eb91 sanity-flr: FAIL: test_0g error resync-ing file '/mnt/lustre/f0g.sanity-flr' sanity-flr: FAIL: test_0h error resync-ing file '/mnt/lustre/d0h.sanity-flr/f0h.sanity-flr' sanity-flr: FAIL: test_0j resync /mnt/lustre/f0j.sanity-flr failed sanity-flr: FAIL: test_37 1: mismatch: \'ec71c760694ca69c598ae00189683df8 -\' vs. \'9ef5b1d2eb35e01fc7ce02893d77ff4a -\' sanity-flr: FAIL: test_38 valid mirror doesn't exist sanity-flr: FAIL: test_42 resync /mnt/lustre/d42.sanity-flr/f42.sanity-flr-1 failed sanity-flr: FAIL: test_50a data mismatch: \'e7785df2c515c75dd9d743d8adea5918 -\' vs. \'d41d8cd98f00b204e9800998ecf8427e -\' sanity-flr: FAIL: test_50b data mismatch: \'267cb9cb0da2c81ebefa760cd599c5d0 -\' vs. \'d41d8cd98f00b204e9800998ecf8427e -\' sanity-flr: FAIL: test_50d data mismatch: \'bcdb34bd846d5427496c6bc82f219a33 -\' vs. \'d41d8cd98f00b204e9800998ecf8427e -\' sanity-flr: FAIL: test_61a cannot migrate /mnt/lustre/d61a.sanity-flr/f61a.sanity-flr sanity-flr: FAIL: test_61c cannot resync mirror /mnt/lustre/d61c.sanity-flr/f61c.sanity-flr sanity-flr: FAIL: test_70a fsx FLR file /mnt/lustre/d70a.sanity-flr/f70a.sanity-flr failed sanity-flr: FAIL: test_200a final resync failed sanity-flr: FAIL: test_200b final resync failed $ git log --oneline 2.16.0-RC1..2.16.0-RC2 | grep -v 'tests:' 69a079d51f93 New RC 2.16.0-RC2 659bb1d70431 LU-18070 sec: clear ACL caches if ACL empty 209607fd7957 LU-18096 enc: ll_get_symlink overlay function 2a5e8e355498 LU-18247 nodemap: initialize unused fields on disk 6fe522d3d4f9 LU-17906 pltrpc: don't use non-uptodate peer at connect ff018bb77a37 LU-18284 llite: disallow udio exceptions 13fd5ebef3a7 LU-18101 sec: fix ACL handling on recent kernels again 66d93ce3e4fc LU-17251 test: improve parallel-scale rr_alloc test cf2c5fe27e90 LU-4315 doc: remove usage of lgroff-macros Let me revert the patch for LU-18284 to see if the issues will be resolved.

            I think these migrate/mirror/resync issues look like the failures I reopened LU-17525 for. They somehow relate to doing a DIO read from a page at the end of the file that may not be entirely within i_size (i.e. i_size is not a multiple of the PAGE_SIZE. That used to work (and continues to work) with older clients and targets, but it appears these failures have started increasing at some point, possibly caused by patch https://review.whamcloud.com/56571 "LU-18284 llite: disallow udio exceptions".

            adilger Andreas Dilger added a comment - I think these migrate/mirror/resync issues look like the failures I reopened LU-17525 for. They somehow relate to doing a DIO read from a page at the end of the file that may not be entirely within i_size (i.e. i_size is not a multiple of the PAGE_SIZE . That used to work (and continues to work) with older clients and targets, but it appears these failures have started increasing at some point, possibly caused by patch https://review.whamcloud.com/56571 " LU-18284 llite: disallow udio exceptions ".

            well, good to know it is not LU-17906 after all

            tappro Mikhail Pershin added a comment - well, good to know it is not LU-17906 after all
            yujian Jian Yu added a comment -

            The same failures occurred in Lustre 2.16.0 RC4:
            https://testing.whamcloud.com/test_sets/5e973afb-d35a-46df-91ea-c6c8d1410c47

            sanity-sec: FAIL: test_52 could not resync mirror
            sanity-sec: FAIL: test_59a verifying mirror failed (1)
            sanity-sec: FAIL: test_59b verifying mirror failed (1)
            
            sanity-flr: FAIL: test_0g error resync-ing file '/mnt/lustre/f0g.sanity-flr'
            sanity-flr: FAIL: test_0h error resync-ing file '/mnt/lustre/d0h.sanity-flr/f0h.sanity-flr'
            sanity-flr: FAIL: test_0j resync /mnt/lustre/f0j.sanity-flr failed
            sanity-flr: FAIL: test_37 1: mismatch: \'04d8acaa945a97ef3137c142bb27db92  -\' vs. \'7736938e3332e9479f34deacfc298d68  -\'
            sanity-flr: FAIL: test_42 resync /mnt/lustre/d42.sanity-flr/f42.sanity-flr-1 failed
            sanity-flr: FAIL: test_50b data mismatch: \'1e70156068f9c8a0e53b1f91fe3ae737  -\' vs. \'d41d8cd98f00b204e9800998ecf8427e  -\'
            sanity-flr: FAIL: test_50d data mismatch: \'ebec484d45a1e0611b3100548350e382  -\' vs. \'d41d8cd98f00b204e9800998ecf8427e  -\'
            sanity-flr: FAIL: test_61a cannot migrate /mnt/lustre/d61a.sanity-flr/f61a.sanity-flr
            sanity-flr: FAIL: test_61c cannot resync mirror /mnt/lustre/d61c.sanity-flr/f61c.sanity-flr
            sanity-flr: FAIL: test_70a fsx FLR file /mnt/lustre/d70a.sanity-flr/f70a.sanity-flr failed
            sanity-flr: FAIL: test_200a final resync failed
            sanity-flr: FAIL: test_200b final resync failed
            
            yujian Jian Yu added a comment - The same failures occurred in Lustre 2.16.0 RC4: https://testing.whamcloud.com/test_sets/5e973afb-d35a-46df-91ea-c6c8d1410c47 sanity-sec: FAIL: test_52 could not resync mirror sanity-sec: FAIL: test_59a verifying mirror failed (1) sanity-sec: FAIL: test_59b verifying mirror failed (1) sanity-flr: FAIL: test_0g error resync-ing file '/mnt/lustre/f0g.sanity-flr' sanity-flr: FAIL: test_0h error resync-ing file '/mnt/lustre/d0h.sanity-flr/f0h.sanity-flr' sanity-flr: FAIL: test_0j resync /mnt/lustre/f0j.sanity-flr failed sanity-flr: FAIL: test_37 1: mismatch: \'04d8acaa945a97ef3137c142bb27db92 -\' vs. \'7736938e3332e9479f34deacfc298d68 -\' sanity-flr: FAIL: test_42 resync /mnt/lustre/d42.sanity-flr/f42.sanity-flr-1 failed sanity-flr: FAIL: test_50b data mismatch: \'1e70156068f9c8a0e53b1f91fe3ae737 -\' vs. \'d41d8cd98f00b204e9800998ecf8427e -\' sanity-flr: FAIL: test_50d data mismatch: \'ebec484d45a1e0611b3100548350e382 -\' vs. \'d41d8cd98f00b204e9800998ecf8427e -\' sanity-flr: FAIL: test_61a cannot migrate /mnt/lustre/d61a.sanity-flr/f61a.sanity-flr sanity-flr: FAIL: test_61c cannot resync mirror /mnt/lustre/d61c.sanity-flr/f61c.sanity-flr sanity-flr: FAIL: test_70a fsx FLR file /mnt/lustre/d70a.sanity-flr/f70a.sanity-flr failed sanity-flr: FAIL: test_200a final resync failed sanity-flr: FAIL: test_200b final resync failed
            yujian Jian Yu added a comment - - edited The same failures occurred in Lustre 2.16.0 RC3: https://testing.whamcloud.com/test_sets/93c3c173-4e95-4731-9a34-f2d23c9368d7 https://testing.whamcloud.com/test_sets/3f4c06f7-76c6-4447-bbd8-42f84620c3c5 https://testing.whamcloud.com/test_sets/476e39bd-ecc0-496f-aa6e-f7bf05acee9c

            People

              tappro Mikhail Pershin
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: