Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.3.0, Lustre 2.4.0
    • Lustre 2.2.0, Lustre 2.3.0, Lustre 2.4.0
    • None
    • 3
    • 4431

    Description

      This issue was created by maloo for Minh Diep <mdiep@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/0e62c5d6-fa97-11e1-887d-52540035b04c.

      survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: NAME=autotest_config sh rpc.sh check_logdir /home/autotest/.autotest/shared_dir/2012-09-08/054638-7fbcd5d87e28
      lfsck : @@@@@@ FAIL: /home/autotest/.autotest/shared_dir/2012-09-08/054638-7fbcd5d87e28 isn't a shared directory
      Trace dump:
      = /usr/lib64/lustre/tests/test-framework.sh:3642:error_noexit()
      = /usr/lib64/lustre/tests/test-framework.sh:3664:error()
      = /usr/lib64/lustre/tests/test-framework.sh:3219:generate_db()
      = /usr/lib64/lustre/tests/lfsck.sh:260:main()

      It's strange to me the different lustre version (b2_2 and b2_3) between server and client would cause above issue. I tried on a cluster with all b2_3 and it worked for the exact some dir (ie /scratch/mdiep/tmp)

      Attachments

        Activity

          [LU-1912] Test failure on test suite lfsck
          pjones Peter Jones added a comment -

          Landed for 2.3 and 2.4

          pjones Peter Jones added a comment - Landed for 2.3 and 2.4
          yujian Jian Yu added a comment -

          Patch for master branch: http://review.whamcloud.com/4021
          It also needs to be cherry-picked to b2_3 branch.

          yujian Jian Yu added a comment - Patch for master branch: http://review.whamcloud.com/4021 It also needs to be cherry-picked to b2_3 branch.
          yujian Jian Yu added a comment - Lustre client build: http://build.whamcloud.com/job/lustre-b2_2/17 Lustre server build: http://build.whamcloud.com/job/lustre-b2_3/19 Distro/Arch: RHEL6.3/x86_64 The same issue occurred: https://maloo.whamcloud.com/test_sets/d84f0c0e-ff5d-11e1-bce0-52540035b04c
          yujian Jian Yu added a comment -

          Hi Peter,
          For 2.3 client and 2.2 server, we can add Lustre version check code in 2.3 test suite to skip the test.
          For 2.2 client and 2.3 server, I think there is no proper way.

          yujian Jian Yu added a comment - Hi Peter, For 2.3 client and 2.2 server, we can add Lustre version check code in 2.3 test suite to skip the test. For 2.2 client and 2.3 server, I think there is no proper way.
          mdiep Minh Diep added a comment -

          If approve, I think we can check and skip server or client run b2_2 or b2_3

          mdiep Minh Diep added a comment - If approve, I think we can check and skip server or client run b2_2 or b2_3
          pjones Peter Jones added a comment -

          Yujian

          We don't have any plans to land anything to b2_2 at this time. Is there a way to just skip this test when interoperating with 2.2? If not then this test failure will disappear when we switching to the 2.4 interop matrix.

          Peter

          pjones Peter Jones added a comment - Yujian We don't have any plans to land anything to b2_2 at this time. Is there a way to just skip this test when interoperating with 2.2? If not then this test failure will disappear when we switching to the 2.4 interop matrix. Peter
          yujian Jian Yu added a comment -

          The patch for LU-1255 in http://review.whamcloud.com/#change,2376 needs to be landed on Lustre b2_2 branch.

          yujian Jian Yu added a comment - The patch for LU-1255 in http://review.whamcloud.com/#change,2376 needs to be landed on Lustre b2_2 branch.
          pjones Peter Jones added a comment -

          Yujian

          Could you please look into this one?

          Thanks

          Peter

          pjones Peter Jones added a comment - Yujian Could you please look into this one? Thanks Peter
          mdiep Minh Diep added a comment -

          on b2_3

          check_write_access() {
          local dir=$1
          local node
          local file

          for node in $(nodes_list); do
          file=$dir/check_file.$(short_hostname $node)
          if [[ ! -f "$file" ]]; then

          1. Logdir not accessible/writable from this node.
            return 1
            fi
            rm -f $file || return 1
            done
            return 0
            }

          on b2_2

          check_write_access() {
          local dir=$1
          for node in $(nodes_list); do
          if [ ! -f "$dir/node.$(short_hostname ${node}).yml" ]; then

          1. Logdir not accessible/writable from this node.
            return 1
            fi
            done
            rm -f $dir/node.*.yml
            return 0
            }

          This means running b2_3 <-> b2_2 will cause this issue because the file created and expected to exist will be different. I don't have a good solution for this. Let's check with Yujian

          mdiep Minh Diep added a comment - on b2_3 check_write_access() { local dir=$1 local node local file for node in $(nodes_list); do file=$dir/check_file.$(short_hostname $node) if [[ ! -f "$file" ]]; then Logdir not accessible/writable from this node. return 1 fi rm -f $file || return 1 done return 0 } on b2_2 check_write_access() { local dir=$1 for node in $(nodes_list); do if [ ! -f "$dir/node.$(short_hostname ${node}).yml" ]; then Logdir not accessible/writable from this node. return 1 fi done rm -f $dir/node.*.yml return 0 } This means running b2_3 <-> b2_2 will cause this issue because the file created and expected to exist will be different. I don't have a good solution for this. Let's check with Yujian

          People

            yujian Jian Yu
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: