Details

    • Technical task
    • Resolution: Fixed
    • Blocker
    • Lustre 2.5.0
    • Lustre 2.5.0
    • 9801

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite runs:
      http://maloo.whamcloud.com/test_sets/21cdd37a-094d-11e3-b004-52540035b04c
      http://maloo.whamcloud.com/test_sets/2ee8358a-0931-11e3-b004-52540035b04c
      http://maloo.whamcloud.com/test_sets/25a97004-094c-11e3-a9b0-52540035b04c

      The sub-test test_8 failed with the following error:

      request on 0x200008101:0x2:0x0 is not SUCCEED

      Info required for matching: sanity-hsm 8

      Attachments

        Activity

          [LU-3791] sanity-hsm test_8: 'request on 0x200002341:0x2:0x0 is not SUCCEED'

          Also perhaps the test can be fixed up so that it could detects this incompatibility and "skip" it when local resources are not compliant? These tests will be ran in all sorts of environments.

          keith Keith Mannthey (Inactive) added a comment - Also perhaps the test can be fixed up so that it could detects this incompatibility and "skip" it when local resources are not compliant? These tests will be ran in all sorts of environments.
          jhammond John Hammond added a comment -

          Bobbie, "won't fix" was probably not the best resolution here, since this issue was fixed by the configuration changes on rosso. The test session you linked above was run before that fix was applied, however.

          jhammond John Hammond added a comment - Bobbie, "won't fix" was probably not the best resolution here, since this issue was fixed by the configuration changes on rosso. The test session you linked above was run before that fix was applied, however.

          I realize this ticket is in a "wont fix" resolved status but I'm still hitting it even with the temporary work around in place from TEI-534 https://maloo.whamcloud.com/test_sets/db1ad8ea-170e-11e3-9d30-52540035b04c

          bobbielind Bobbie Lind (Inactive) added a comment - I realize this ticket is in a "wont fix" resolved status but I'm still hitting it even with the temporary work around in place from TEI-534 https://maloo.whamcloud.com/test_sets/db1ad8ea-170e-11e3-9d30-52540035b04c

          Hi Aurelien,

          To support multiple agents, we're using NFS share so that agents can access the same archive. To archive a lustre file, lhsmtool_posix has to copy file data, xattr and attr to the corresponding file in archive. The problem is to copy attr we have to change the file owner in NFS share.

          On rosso, which is a cluster to run autotest, has root_squash enabled, so every attempt to change file owner to root will fail. This then causes archive failure this is why we have seen so many failures on sanity-hsm recently.

          TEI-534 is an internal task to request administrator to disable root_squash on the NFS server.

          jay Jinshan Xiong (Inactive) added a comment - Hi Aurelien, To support multiple agents, we're using NFS share so that agents can access the same archive. To archive a lustre file, lhsmtool_posix has to copy file data, xattr and attr to the corresponding file in archive. The problem is to copy attr we have to change the file owner in NFS share. On rosso, which is a cluster to run autotest, has root_squash enabled, so every attempt to change file owner to root will fail. This then causes archive failure this is why we have seen so many failures on sanity-hsm recently. TEI-534 is an internal task to request administrator to disable root_squash on the NFS server.

          It is possible to know what is in TEI-534?

          adegremont Aurelien Degremont (Inactive) added a comment - It is possible to know what is in TEI-534?

          I believe this issue will no longer exist after TEI-534 is fixed.

          jay Jinshan Xiong (Inactive) added a comment - I believe this issue will no longer exist after TEI-534 is fixed.

          Please see http://review.whamcloud.com/7581. This change passes the --no-attr and --no-xattr to the copytool in sanity-hsm.sh. Let's see how it does.

          jhammond John Hammond added a comment - Please see http://review.whamcloud.com/7581 . This change passes the --no-attr and --no-xattr to the copytool in sanity-hsm.sh. Let's see how it does.
          jhammond John Hammond added a comment -

          But why are there no error messages in the copytool logs?

          jhammond John Hammond added a comment - But why are there no error messages in the copytool logs?

          I think we can fix this issue by setting correct permission on nfs share.

          jay Jinshan Xiong (Inactive) added a comment - I think we can fix this issue by setting correct permission on nfs share.

          I just did an experiment on rosso and it verified my guess, as follows:

          [root@wtm-12vm2 0000]# ls -l
          total 7172
          -rw------- 1 nfsnobody nfsnobody 7340032 Sep  3 13:21 0x200002341:0x211:0x0_tmp
          -rw------- 1 nfsnobody nfsnobody      56 Sep  3 13:21 0x200002341:0x211:0x0_tmp.lov
          [root@wtm-12vm2 0000]# cp /etc/passwd .
          [root@wtm-12vm2 0000]# chown root.root passwd
          chown: changing ownership of `passwd': Operation not permitted
          [root@wtm-12vm2 0000]# pwd
          /home/cgearing/.autotest/shared_dir/2013-09-03/040347-70339907358780/arc1/0211/0000/2341/0000/0002/0000
          

          I did the same thing on toro and it worked so this is why the tests only failed sometimes. If the test was running on rosso, it failed. pretty nasty, huh?

          jay Jinshan Xiong (Inactive) added a comment - I just did an experiment on rosso and it verified my guess, as follows: [root@wtm-12vm2 0000]# ls -l total 7172 -rw------- 1 nfsnobody nfsnobody 7340032 Sep 3 13:21 0x200002341:0x211:0x0_tmp -rw------- 1 nfsnobody nfsnobody 56 Sep 3 13:21 0x200002341:0x211:0x0_tmp.lov [root@wtm-12vm2 0000]# cp /etc/passwd . [root@wtm-12vm2 0000]# chown root.root passwd chown: changing ownership of `passwd': Operation not permitted [root@wtm-12vm2 0000]# pwd /home/cgearing/.autotest/shared_dir/2013-09-03/040347-70339907358780/arc1/0211/0000/2341/0000/0002/0000 I did the same thing on toro and it worked so this is why the tests only failed sometimes. If the test was running on rosso, it failed. pretty nasty, huh?

          People

            jhammond John Hammond
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: