Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1233

Test failure on test suite parallel-scale, subtest test_compilebench,no space left

Details

    • 3
    • 5184

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/7722bc68-70e8-11e1-a89e-5254004bbbd3.

      The sub-test test_compilebench failed with the following error:

      compilebench failed: 1

      Info required for matching: parallel-scale compilebench

      Attachments

        Activity

          [LU-1233] Test failure on test suite parallel-scale, subtest test_compilebench,no space left
          yujian Jian Yu added a comment -

          Can this ticket be closed?

          I'll back-port the patch to Lustre b2_5 branch.

          yujian Jian Yu added a comment - Can this ticket be closed? I'll back-port the patch to Lustre b2_5 branch.

          Can this ticket be closed?

          jlevi Jodi Levi (Inactive) added a comment - Can this ticket be closed?
          yujian Jian Yu added a comment -

          Patch landed on Lustre b2_4 branch for 2.4.2 and on master branch for 2.6.0.

          yujian Jian Yu added a comment - Patch landed on Lustre b2_4 branch for 2.4.2 and on master branch for 2.6.0.
          yujian Jian Yu added a comment - - edited

          Here is the patch to unlink the files created in performance-sanity.sh through mdsrate-{create,lookup,stat}-*.sh after create/lookup/stat operation fails: http://review.whamcloud.com/6483

          The above patch has landed on Lustre b2_1 branch.
          Here is the patch for master branch: http://review.whamcloud.com/8265. It also needs to be cherry-picked to Lustre b2_5 branch.
          And here is the patch for Lustre b2_4 branch: http://review.whamcloud.com/8289.

          yujian Jian Yu added a comment - - edited Here is the patch to unlink the files created in performance-sanity.sh through mdsrate-{create,lookup,stat}-*.sh after create/lookup/stat operation fails: http://review.whamcloud.com/6483 The above patch has landed on Lustre b2_1 branch. Here is the patch for master branch: http://review.whamcloud.com/8265 . It also needs to be cherry-picked to Lustre b2_5 branch. And here is the patch for Lustre b2_4 branch: http://review.whamcloud.com/8289 .
          yujian Jian Yu added a comment -

          Lustre Client: 1.8.9-wc1
          Lustre Client Build: http://build.whamcloud.com/job/lustre-b1_8/258/

          Lustre Server: v2_1_6_RC1
          Lustre Server Build: http://build.whamcloud.com/job/lustre-b2_1/208/

          Network: TCP (1GigE)

          performance-sanity test_8 failed with out of space issue:
          https://maloo.whamcloud.com/test_sets/58eec84a-cb8c-11e2-a1fe-52540035b04c

          yujian Jian Yu added a comment - Lustre Client: 1.8.9-wc1 Lustre Client Build: http://build.whamcloud.com/job/lustre-b1_8/258/ Lustre Server: v2_1_6_RC1 Lustre Server Build: http://build.whamcloud.com/job/lustre-b2_1/208/ Network: TCP (1GigE) performance-sanity test_8 failed with out of space issue: https://maloo.whamcloud.com/test_sets/58eec84a-cb8c-11e2-a1fe-52540035b04c
          yujian Jian Yu added a comment -

          In performance-sanity, after creating large number files hit out of space issue, those files were not unlinked/removed successfully. So, the test script also needs to be improved.

          Here is the patch to unlink the files created in performance-sanity.sh through mdsrate-{create,lookup,stat}-*.sh after create/lookup/stat operation fails:
          http://review.whamcloud.com/6483

          With the above patch, the issue was narrowed that only performance-sanity test 4 hit out of space issue, and the tests after test 4 were not affected.

          The next step is to figure out why test 4 hits out of space issue over IB network.

          yujian Jian Yu added a comment - In performance-sanity, after creating large number files hit out of space issue, those files were not unlinked/removed successfully. So, the test script also needs to be improved. Here is the patch to unlink the files created in performance-sanity.sh through mdsrate-{create,lookup,stat}-*.sh after create/lookup/stat operation fails: http://review.whamcloud.com/6483 With the above patch, the issue was narrowed that only performance-sanity test 4 hit out of space issue, and the tests after test 4 were not affected. The next step is to figure out why test 4 hits out of space issue over IB network.
          yujian Jian Yu added a comment -

          Lustre Branch: b2_1
          Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/204
          Distro/Arch: RHEL5.9/x86_64
          Network: IB (in-kernel OFED)

          The tests after performance-sanity were affected by the out of space issue:
          https://maloo.whamcloud.com/test_sessions/bebd9a1c-c5c8-11e2-9bf1-52540035b04c

          yujian Jian Yu added a comment - Lustre Branch: b2_1 Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/204 Distro/Arch: RHEL5.9/x86_64 Network: IB (in-kernel OFED) The tests after performance-sanity were affected by the out of space issue: https://maloo.whamcloud.com/test_sessions/bebd9a1c-c5c8-11e2-9bf1-52540035b04c
          yujian Jian Yu added a comment -

          Lustre Client: 1.8.9-wc1
          Lustre Client Build: http://build.whamcloud.com/job/lustre-b1_8/258/

          Lustre Server: v2_1_5_RC1
          Lustre Server Build: http://build.whamcloud.com/job/lustre-b2_1/191/

          Network: TCP (1GigE)

          MDSSIZE=2097152
          OSTSIZE=149718677
          

          The tests after performance-sanity were affected by the out of space issue:
          https://maloo.whamcloud.com/test_sessions/3bb63464-92c7-11e2-b06e-52540035b04c

          Dmesg on MDS node showed that:

          Lustre: DEBUG MARKER: ===== mdsrate-stat-large.sh Test preparation: creating 1000000 files.
          LustreError: 21464:0:(mdd_dir.c:1889:mdd_create()) error on stripe info copy -28 
          LustreError: 21464:0:(mdd_dir.c:1889:mdd_create()) error on stripe info copy -28 
          Lustre: DEBUG MARKER: /usr/sbin/lctl mark  performance-sanity test_8: @@@@@@ FAIL: test_8 failed with 1
          
          yujian Jian Yu added a comment - Lustre Client: 1.8.9-wc1 Lustre Client Build: http://build.whamcloud.com/job/lustre-b1_8/258/ Lustre Server: v2_1_5_RC1 Lustre Server Build: http://build.whamcloud.com/job/lustre-b2_1/191/ Network: TCP (1GigE) MDSSIZE=2097152 OSTSIZE=149718677 The tests after performance-sanity were affected by the out of space issue: https://maloo.whamcloud.com/test_sessions/3bb63464-92c7-11e2-b06e-52540035b04c Dmesg on MDS node showed that: Lustre: DEBUG MARKER: ===== mdsrate-stat-large.sh Test preparation: creating 1000000 files. LustreError: 21464:0:(mdd_dir.c:1889:mdd_create()) error on stripe info copy -28 LustreError: 21464:0:(mdd_dir.c:1889:mdd_create()) error on stripe info copy -28 Lustre: DEBUG MARKER: /usr/sbin/lctl mark performance-sanity test_8: @@@@@@ FAIL: test_8 failed with 1
          yujian Jian Yu added a comment -

          Lustre Branch: b2_1
          Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/186
          Distro/Arch: RHEL6.3/x86_64
          Network: IB (in-kernel OFED)

          The issue is still blocking the tests after performance-sanity in full test group from running under IB network configuration:
          https://maloo.whamcloud.com/test_sessions/6f41a40a-8b40-11e2-aa18-52540035b04c

          yujian Jian Yu added a comment - Lustre Branch: b2_1 Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/186 Distro/Arch: RHEL6.3/x86_64 Network: IB (in-kernel OFED) The issue is still blocking the tests after performance-sanity in full test group from running under IB network configuration: https://maloo.whamcloud.com/test_sessions/6f41a40a-8b40-11e2-aa18-52540035b04c
          yujian Jian Yu added a comment -

          Lustre Branch: b2_1
          Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/164
          Network: o2ib (in-kernel OFED)

          MDSSIZE=2097152
          OSTSIZE=31061817
          

          performance-sanity: https://maloo.whamcloud.com/test_sets/ac776a56-68dd-11e2-ac0a-52540035b04c
          parallel-scale: https://maloo.whamcloud.com/test_sets/d2cc2afc-68dd-11e2-ac0a-52540035b04c

          yujian Jian Yu added a comment - Lustre Branch: b2_1 Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/164 Network: o2ib (in-kernel OFED) MDSSIZE=2097152 OSTSIZE=31061817 performance-sanity: https://maloo.whamcloud.com/test_sets/ac776a56-68dd-11e2-ac0a-52540035b04c parallel-scale: https://maloo.whamcloud.com/test_sets/d2cc2afc-68dd-11e2-ac0a-52540035b04c
          yujian Jian Yu added a comment -

          Lustre Branch: b1_8
          Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/249
          Distro/Arch: RHEL5.8/x86_64(server), RHEL6.3/x86_64(client)
          Network: TCP

          MDSSIZE=2097152
          OSTSIZE=7416428
          

          The compilebench tests in parallel-scale

          {,-nfsv3,-nfsv4}

          all failed with:

          LustreError: 23875:0:(filter.c:3459:filter_precreate()) create failed rc = -28
          

          Maloo reports:
          parallel-scale: https://maloo.whamcloud.com/test_sets/e021f142-6337-11e2-ae8b-52540035b04c
          parallel-scale-nfsv3: https://maloo.whamcloud.com/test_sets/85799c1c-6338-11e2-ae8b-52540035b04c
          parallel-scale-nfsv4: https://maloo.whamcloud.com/test_sets/ffb1cbd0-6338-11e2-ae8b-52540035b04c

          With the following values, the same tests passed on the same Lustre b1_8 build over TCP network on RHEL5.8/x86_64 disto/arch (both server and client):

          MDSSIZE=2097152
          OSTSIZE=11311139
          

          Maloo report: https://maloo.whamcloud.com/test_sessions/820df576-6353-11e2-ae8b-52540035b04c

          yujian Jian Yu added a comment - Lustre Branch: b1_8 Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/249 Distro/Arch: RHEL5.8/x86_64(server), RHEL6.3/x86_64(client) Network: TCP MDSSIZE=2097152 OSTSIZE=7416428 The compilebench tests in parallel-scale {,-nfsv3,-nfsv4} all failed with: LustreError: 23875:0:(filter.c:3459:filter_precreate()) create failed rc = -28 Maloo reports: parallel-scale: https://maloo.whamcloud.com/test_sets/e021f142-6337-11e2-ae8b-52540035b04c parallel-scale-nfsv3: https://maloo.whamcloud.com/test_sets/85799c1c-6338-11e2-ae8b-52540035b04c parallel-scale-nfsv4: https://maloo.whamcloud.com/test_sets/ffb1cbd0-6338-11e2-ae8b-52540035b04c With the following values, the same tests passed on the same Lustre b1_8 build over TCP network on RHEL5.8/x86_64 disto/arch (both server and client): MDSSIZE=2097152 OSTSIZE=11311139 Maloo report: https://maloo.whamcloud.com/test_sessions/820df576-6353-11e2-ae8b-52540035b04c

          People

            yujian Jian Yu
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: