Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3878

sanity-benchmark test fsx: Bus error

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • Lustre 2.4.1, Lustre 2.5.0, Lustre 2.6.0
    • None

    • Lustre build: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1)
      Distro/Arch: RHEL6.4/x86_64 + FC18/x86_64 (Server + Client)
    • 3
    • 10059

    Description

      sanity-benchmark test fsx failed as follows:

      == sanity-benchmark test fsx: fsx ==================================================================== 22:15:58 (1378185358)
      debug=0
      Using: fsx -c 50 -p 1000 -S 29278 -P /tmp -l 206139         -N 100000  /mnt/lustre/f0.fsxfile
      Chance of close/open is 1 in 50
      Seed set to 29278
      truncating to largest ever: 0xd3af
      /usr/lib64/lustre/tests/sanity-benchmark.sh: line 186: 12471 Bus error               $CMD
       sanity-benchmark test_fsx: @@@@@@ FAIL: fsx failed 
      

      Maloo report: https://maloo.whamcloud.com/test_sets/becb9218-14ef-11e3-ac48-52540035b04c

      This is a regression on Lustre b2_4 branch.

      Attachments

        Issue Links

          Activity

            [LU-3878] sanity-benchmark test fsx: Bus error
            yujian Jian Yu added a comment -

            Yes, the failure still occurred on the latest Lustre b2_4 branch with FSTYPE=zfs:
            https://maloo.whamcloud.com/test_sets/17d950e4-58ba-11e3-83d7-52540035b04c
            https://maloo.whamcloud.com/test_sets/4767b964-57ce-11e3-8d5c-52540035b04c
            https://maloo.whamcloud.com/test_sets/f21a0152-4ab1-11e3-8252-52540035b04c

            In the above test reports, all of the iozone tests failed as follows:

            write: No space left on device
            

            or

            Write error No space left on device (rc = -1, len = 4194304)
            

            So, it seems that the out of space failure of iozone caused the fsx failure.

            yujian Jian Yu added a comment - Yes, the failure still occurred on the latest Lustre b2_4 branch with FSTYPE=zfs: https://maloo.whamcloud.com/test_sets/17d950e4-58ba-11e3-83d7-52540035b04c https://maloo.whamcloud.com/test_sets/4767b964-57ce-11e3-8d5c-52540035b04c https://maloo.whamcloud.com/test_sets/f21a0152-4ab1-11e3-8252-52540035b04c In the above test reports, all of the iozone tests failed as follows: write: No space left on device or Write error No space left on device (rc = -1, len = 4194304) So, it seems that the out of space failure of iozone caused the fsx failure.

            the previous iozone run used up all spaces.

            I can't connect this symptom to that patch. But from what I have seen so far, you reverted that patch on Sep 24 but it still occurred after that.

            jay Jinshan Xiong (Inactive) added a comment - the previous iozone run used up all spaces. I can't connect this symptom to that patch. But from what I have seen so far, you reverted that patch on Sep 24 but it still occurred after that.
            green Oleg Drokin added a comment -

            But why does it stop after the patches are reverted?
            Also I don't think we see any enospace errors for fsx runs in the logs?

            green Oleg Drokin added a comment - But why does it stop after the patches are reverted? Also I don't think we see any enospace errors for fsx runs in the logs?

            The failure of fax is probably a fallout of the previous failure on iozone. It used up all disk spaces on the OSTs, so there is no any grants on client which made mkwrite() fail.

            jay Jinshan Xiong (Inactive) added a comment - The failure of fax is probably a fallout of the previous failure on iozone. It used up all disk spaces on the OSTs, so there is no any grants on client which made mkwrite() fail.
            yujian Jian Yu added a comment - Here is the search result on Maloo: http://tinyurl.com/ozo5c7a The failure occurred not only on zfs and b2_4, but also on ldiskfs and master/b2_5: https://maloo.whamcloud.com/test_sets/87757614-44d1-11e3-8c03-52540035b04c https://maloo.whamcloud.com/test_sets/6f3d0b5e-4cc7-11e3-826a-52540035b04c https://maloo.whamcloud.com/test_sets/2582fa8c-3bbf-11e3-b062-52540035b04c
            green Oleg Drokin added a comment -

            Is this only happening on zfs?

            Only on b2_4, but not on master?

            green Oleg Drokin added a comment - Is this only happening on zfs? Only on b2_4, but not on master?
            yujian Jian Yu added a comment -

            Lustre build: http://build.whamcloud.com/job/lustre-b2_4/47/
            Distro/Arch: RHEL6.4/x86_64
            FSTYPE=zfs

            sanity-benchmark test fsx hit the same failure again:
            https://maloo.whamcloud.com/test_sets/5d2a22d4-43a9-11e3-942a-52540035b04c

            yujian Jian Yu added a comment - Lustre build: http://build.whamcloud.com/job/lustre-b2_4/47/ Distro/Arch: RHEL6.4/x86_64 FSTYPE=zfs sanity-benchmark test fsx hit the same failure again: https://maloo.whamcloud.com/test_sets/5d2a22d4-43a9-11e3-942a-52540035b04c
            yujian Jian Yu added a comment -

            After patch http://review.whamcloud.com/7481 was reverted from Lustre b2_4 branch, the failure did not occur on Lustre 2.4.1 RC2.

            yujian Jian Yu added a comment - After patch http://review.whamcloud.com/7481 was reverted from Lustre b2_4 branch, the failure did not occur on Lustre 2.4.1 RC2.
            yujian Jian Yu added a comment -

            By searching on Maloo, I found that test fsx passed on FC18 on Lustre b2_4 build #40 and previous builds. Builds #41, #42, #43 were not tested on FC18. It seems that the culprit is in build #42.

            yujian Jian Yu added a comment - By searching on Maloo, I found that test fsx passed on FC18 on Lustre b2_4 build #40 and previous builds. Builds #41, #42, #43 were not tested on FC18. It seems that the culprit is in build #42.
            yujian Jian Yu added a comment -

            Lustre build: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1)
            Distro/Arch: RHEL6.4/x86_64
            FSTYPE=zfs

            sanity-benchmark test fsx also hit the same failure:
            https://maloo.whamcloud.com/test_sets/e004601a-1556-11e3-8938-52540035b04c

            FYI, here is the query result of sanity-benchmark test_fsx with "FAIL" status on Lustre b2_4 branch:
            http://tinyurl.com/kxheu43

            yujian Jian Yu added a comment - Lustre build: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1) Distro/Arch: RHEL6.4/x86_64 FSTYPE=zfs sanity-benchmark test fsx also hit the same failure: https://maloo.whamcloud.com/test_sets/e004601a-1556-11e3-8938-52540035b04c FYI, here is the query result of sanity-benchmark test_fsx with "FAIL" status on Lustre b2_4 branch: http://tinyurl.com/kxheu43

            People

              green Oleg Drokin
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: