Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12363

sanity-dom: test_fsx failed with 120

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 2.12.5, Lustre 2.12.6
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Lai Siyao <lai.siyao@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/86c090e6-830a-11e9-8c65-52540065bddc

      == sanity-dom test fsx: Dual-mount fsx with DoM files ================================================ 16:12:36 (1559232756)
      Chance of close/open is 1 in 50
      Seed set to 2757
      fd 0: /mnt/lustre/ffsx.sanity-dom
      fd 1: /mnt/lustre2/ffsx.sanity-dom
      truncating to largest ever: 0xd3ab8
      000001[1] 1559232757.193757 trunc from 00000000  to  0x0d3ab7	(0xd3ab8 bytes)
             1559232757.196511 trunc done
      000002[0] 1559232757.197183 trunc from 0x0d3ab8  to  0x1a6bda	(0xd3123 bytes)
             1559232757.200406 trunc done
      000003[0] 1559232757.203269 write      0x03f41a thru 0x048669	(0x9250 bytes)
             1559232757.205390 write done
      000004[0] 1559232757.206069 mapread    0x0634ca thru 0x06bc8d	(0x87c4 bytes)
             1559232757.206138 mmap done
             1559232757.230198 memcpy done
             1559232757.230234 munmap done
      000005[1] 1559232757.232410 write      0x027f54 thru 0x03489f	(0xc94c bytes)
             1559232757.238426 write done
      Size error: expected 0xd3123 stat 0xd3ab8 seek 0xd3ab8
      LOG DUMP (5 total operations):
      1[1]: 1559232757.193757 TRUNCATE UP	from 0x0 to 0xd3ab8
      2[0]: 1559232757.197183 TRUNCATE DOWN	from 0xd3ab8 to 0xd3123
      3[0]: 1559232757.203269 WRITE    0x3f41a thru 0x48669 (0x9250 bytes)
      4[0]: 1559232757.206069 MAPREAD  0x634ca thru 0x6bc8d (0x87c4 bytes)
      5[1]: 1559232757.232410 WRITE    0x27f54 thru 0x3489f (0xc94c bytes)
      Correct content saved for comparison
      (maybe hexdump "/mnt/lustre/ffsx.sanity-dom" vs "/mnt/lustre/ffsx.sanity-dom.fsxgood")
       sanity-dom test_fsx: @@@@@@ FAIL: test_fsx failed with 120 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:5877:error()
        = /usr/lib64/lustre/tests/test-framework.sh:6164:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:6203:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:6050:run_test()
        = /usr/lib64/lustre/tests/sanity-dom.sh:130:main()
      

      Attachments

        Issue Links

          Activity

            [LU-12363] sanity-dom: test_fsx failed with 120

            And another missong patch that can be related:

            https://review.whamcloud.com/#/c/40223/

            tappro Mikhail Pershin added a comment - And another missong patch that can be related: https://review.whamcloud.com/#/c/40223/

            James, also in 2_12 we ar missing two LVB fixes which are also about correct file size:

            https://review.whamcloud.com/40739

            https://review.whamcloud.com/40740

             

            tappro Mikhail Pershin added a comment - James, also in 2_12 we ar missing two LVB fixes which are also about correct file size: https://review.whamcloud.com/40739 https://review.whamcloud.com/40740  

            at least this patch is still needed:

            https://review.whamcloud.com/#/c/40302/

            tappro Mikhail Pershin added a comment - at least this patch is still needed: https://review.whamcloud.com/#/c/40302/

            James, I am checking that, probably some other DoM patch is still missing in b2_12

            tappro Mikhail Pershin added a comment - James, I am checking that, probably some other DoM patch is still missing in b2_12
            jamesanunez James Nunez (Inactive) added a comment - - edited

            I'm still seeing what looks like this issue on b2_12 even after the LU-12296 patch, https://review.whamcloud.com/#/c/40296/ (LU-12296 llite: improve ll_dom_lock_cancel), landed to b2_12.

            Most recent failure at https://testing.whamcloud.com/test_sets/33e75ba3-d6fe-4d96-9acd-8d280671f396 . From the client test_log

            == sanity-dom test fsx: Dual-mount fsx with DoM files ================================================ 03:11:21 (1605669081)
            Chance of close/open is 1 in 50
            Seed set to 9082
            fd 0: /mnt/lustre/ffsx.sanity-dom
            fd 1: /mnt/lustre2/ffsx.sanity-dom
            skipping zero size read
            truncating to largest ever: 0x7f60b
            000002[0] 1605669082.928997 trunc from 00000000  to  0x07f60a	(0x7f60b bytes)
                   1605669082.931277 trunc done
            ...
            000167[1] 1605669084.691482 mapread    0x081b59 thru 0x08703b	(0x54e3 bytes)
                   1605669084.691550 mmap done
                   1605669084.691577 memcpy done
                   1605669084.691589 munmap done
            READ BAD DATA: offset = 0x81b59, size = 0x54e3
            OFFSET	GOOD	BAD	RANGE
            0x81b59	0xa3a2	000000	 0x2e63
            operation# (mod 256) for the bad dataunknown, check HOLE and EXTEND ops
            LOG DUMP (170 total operations):
            1[1]: 1605669082.925562 SKIPPED (no operation)
            2[0]: 1605669082.928997 TRUNCATE UP	from 0x0 to 0x7f60b
            ...
            169[1]: 1605669084.689675 READ     0x9d146 thru 0xa5942 (0x87fd bytes)
            170[1]: 1605669084.691482 MAPREAD  0x81b59 thru 0x8703b (0x54e3 bytes)	***RRRR***
            Correct content saved for comparison
            (maybe hexdump "/mnt/lustre/ffsx.sanity-dom" vs "/mnt/lustre/ffsx.sanity-dom.fsxgood")
             sanity-dom test_fsx: @@@@@@ FAIL: test_fsx failed with 110 
            
            jamesanunez James Nunez (Inactive) added a comment - - edited I'm still seeing what looks like this issue on b2_12 even after the LU-12296 patch, https://review.whamcloud.com/#/c/40296/ ( LU-12296 llite: improve ll_dom_lock_cancel), landed to b2_12. Most recent failure at https://testing.whamcloud.com/test_sets/33e75ba3-d6fe-4d96-9acd-8d280671f396 . From the client test_log == sanity-dom test fsx: Dual-mount fsx with DoM files ================================================ 03:11:21 (1605669081) Chance of close/open is 1 in 50 Seed set to 9082 fd 0: /mnt/lustre/ffsx.sanity-dom fd 1: /mnt/lustre2/ffsx.sanity-dom skipping zero size read truncating to largest ever: 0x7f60b 000002[0] 1605669082.928997 trunc from 00000000 to 0x07f60a (0x7f60b bytes) 1605669082.931277 trunc done ... 000167[1] 1605669084.691482 mapread 0x081b59 thru 0x08703b (0x54e3 bytes) 1605669084.691550 mmap done 1605669084.691577 memcpy done 1605669084.691589 munmap done READ BAD DATA: offset = 0x81b59, size = 0x54e3 OFFSET GOOD BAD RANGE 0x81b59 0xa3a2 000000 0x2e63 operation# (mod 256) for the bad dataunknown, check HOLE and EXTEND ops LOG DUMP (170 total operations): 1[1]: 1605669082.925562 SKIPPED (no operation) 2[0]: 1605669082.928997 TRUNCATE UP from 0x0 to 0x7f60b ... 169[1]: 1605669084.689675 READ 0x9d146 thru 0xa5942 (0x87fd bytes) 170[1]: 1605669084.691482 MAPREAD 0x81b59 thru 0x8703b (0x54e3 bytes) ***RRRR*** Correct content saved for comparison (maybe hexdump "/mnt/lustre/ffsx.sanity-dom" vs "/mnt/lustre/ffsx.sanity-dom.fsxgood") sanity-dom test_fsx: @@@@@@ FAIL: test_fsx failed with 110

            It looks like this is result of the LU-12296 issue. Patch is pushed for b2_12: https://review.whamcloud.com/40277/

            tappro Mikhail Pershin added a comment - It looks like this is result of the LU-12296 issue. Patch is pushed for b2_12: https://review.whamcloud.com/40277/

            Investigation shows simple use case reproduces that:

            [client 1] truncate UP to some size SIZE0
            [client 2] truncate DOWN to SIZE1
            [client 1] write in between SIZE1 and SIZE0
            check filesize is last write END
            

            while both truncates works correctly (file size will be SIZE1 on both clients without last write), addition of the last write on client1 sets file size to SIZE0 after all.
            I am checking this further to find the reason of that

            tappro Mikhail Pershin added a comment - Investigation shows simple use case reproduces that: [client 1] truncate UP to some size SIZE0 [client 2] truncate DOWN to SIZE1 [client 1] write in between SIZE1 and SIZE0 check filesize is last write END while both truncates works correctly (file size will be SIZE1 on both clients without last write), addition of the last write on client1 sets file size to SIZE0  after all. I am checking this further to find the reason of that

            Issue still exists in b2_12 at least and starts occuring quite often recently.

            tappro Mikhail Pershin added a comment - Issue still exists in b2_12 at least and starts occuring quite often recently.

            People

              tappro Mikhail Pershin
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: