Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7656

replay-single_70c test failed tar: Exiting with failure status due to previous errors

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • Lustre 2.8.0
    • 3
    • 9223372036854775807

    Description

      == replay-single test 70c: tar 1mdts recovery == 02:32:52 (1441506772)
      Starting client fre1211,fre1212:  -o user_xattr,flock fre1209@tcp:/lustre /mnt/lustre
      Started clients fre1211,fre1212: 
      fre1209@tcp:/lustre on /mnt/lustre type lustre (rw,user_xattr,flock)
      fre1209@tcp:/lustre on /mnt/lustre type lustre (rw,user_xattr,flock)
      Started tar 8730
      tar: Removing leading `/' from member names
      tar: Removing leading `/' from member names
      tar: Removing leading `/' from member names
      tar: Removing leading `/' from member names
      tar: Removing leading `/' from member names
      tar: Removing leading `/' from member names
      Filesystem          1K-blocks  Used Available Use% Mounted on
      fre1209@tcp:/lustre   1377952 68056   1233908   6% /mnt/lustre
      tar: Removing leading `/' from member names
      test_70c fail mds1 1 times
      Failing mds1 on fre1209
      Stopping /mnt/mds1 (opts:) on fre1209
      pdsh@fre1211: fre1209: ssh exited with exit code 1
      reboot facets: mds1
      Failover mds1 to fre1209
      02:35:20 (1441506920) waiting for fre1209 network 900 secs ...
      02:35:20 (1441506920) network interface is UP
      mount facets: mds1
      Starting mds1: -o rw,user_xattr  /dev/vdb /mnt/mds1
      fre1209: mount.lustre: set /sys/block/vdb/queue/max_sectors_kb to 2147483647
      fre1209: 
      Started lustre-MDT0000
      fre1212: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 11 sec
      fre1211: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 11 sec
      tar: Removing leading `/' from member names
      tar: Removing leading `/' from member names
      tar: Removing leading `/' from member names
      tar: Removing leading `/' from member names
      tar: Removing leading `/' from member names
      Filesystem          1K-blocks  Used Available Use% Mounted on
      fre1209@tcp:/lustre   1377952 68056   1237060   6% /mnt/lustre
      test_70c fail mds1 2 times
      Failing mds1 on fre1209
      Stopping /mnt/mds1 (opts:) on fre1209
      pdsh@fre1211: fre1209: ssh exited with exit code 1
      reboot facets: mds1
      Failover mds1 to fre1209
      02:38:01 (1441507081) waiting for fre1209 network 900 secs ...
      02:38:01 (1441507081) network interface is UP
      mount facets: mds1
      Starting mds1: -o rw,user_xattr  /dev/vdb /mnt/mds1
      fre1209: mount.lustre: set /sys/block/vdb/queue/max_sectors_kb to 2147483647
      fre1209: 
      Started lustre-MDT0000
      fre1212: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 9 sec
      fre1211: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 9 sec
      Resetting fail_loc on all nodes.../usr/lib64/lustre/tests/test-framework.sh: line 2976:  8730 Killed                  ( while true; do
          test_mkdir -p -c$MDSCOUNT $DIR/$tdir || break; if [ $MDSCOUNT -ge 2 ]; then
              $LFS setdirstripe -D -c$MDSCOUNT $DIR/$tdir || error "set default dirstripe failed";
          fi; cd $DIR/$tdir || break; tar cf - /etc | tar xf - || error "tar failed"; cd $DIR || break; rm -rf $DIR/$tdir || break;
      done )
      done.
      tar: etc/ssl: Cannot stat: No such file or directory
      tar: etc/sysconfig/network-scripts: Cannot stat: No such file or directory
      tar: etc/sysconfig: Cannot stat: No such file or directory
      tar: etc/pam.d: Cannot stat: No such file or directory
      tar: etc/rc.d/rc0.d: Cannot stat: No such file or directory
      tar: etc/rc.d/rc5.d: Cannot stat: No such file or directory
      tar: etc/rc.d/rc2.d: Cannot stat: No such file or directory
      tar: etc/rc.d/rc4.d: Cannot stat: No such file or directory
      tar: etc/rc.d/rc6.d: Cannot stat: No such file or directory
      tar: etc/rc.d/rc3.d: Cannot stat: No such file or directory
      tar: etc/rc.d/rc1.d: Cannot stat: No such file or directory
      tar: etc/rc.d: Cannot stat: No such file or directory
      tar: etc/profile.d: Cannot stat: No such file or directory
      tar: etc/alternatives: Cannot stat: No such file or directory
      tar: Exiting with failure status due to previous errors
      
      

      Attachments

        1. 70c.lctl.tgz
          677 kB
          Noopur Maheshwari

        Activity

          [LU-7656] replay-single_70c test failed tar: Exiting with failure status due to previous errors

          Landed to master for 2.9.0

          jgmitter Joseph Gmitter (Inactive) added a comment - Landed to master for 2.9.0

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18732/
          Subject: LU-7656 tests: tar fix for replay-single/70c
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 13f4d2a5ab81b479fcc1cd2263c2cd8db8b616c5

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18732/ Subject: LU-7656 tests: tar fix for replay-single/70c Project: fs/lustre-release Branch: master Current Patch Set: Commit: 13f4d2a5ab81b479fcc1cd2263c2cd8db8b616c5

          Noopur Maheshwari (noopur.maheshwari@seagate.com) uploaded a new patch: http://review.whamcloud.com/18732
          Subject: LU-7656 tests: tar fix for replay-single/70c
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: b346da54ead50afc6f72615a33f4ed0e1f27b41e

          gerrit Gerrit Updater added a comment - Noopur Maheshwari (noopur.maheshwari@seagate.com) uploaded a new patch: http://review.whamcloud.com/18732 Subject: LU-7656 tests: tar fix for replay-single/70c Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b346da54ead50afc6f72615a33f4ed0e1f27b41e

          Hello James,

          I figured out that it isn't a tar utility issue, instead it is a test case issue.

          kill -0, used in the test case, is to determine if one had permissions to send signals to a running process via kill.
          kill -0, neither kills tar, nor waits for it to complete.

          The tar process is running in an infinite loop, and the removal/cleanup of files interferes in the process and causes tar to fail.
          Main process should wait for the tar process to complete before cleanup and then exit gracefully. I'll push the patch for the same.

          Could you please reopen the ticket?

          Thanks

          noopur.maheshwari Noopur Maheshwari (Inactive) added a comment - Hello James, I figured out that it isn't a tar utility issue, instead it is a test case issue. kill -0, used in the test case, is to determine if one had permissions to send signals to a running process via kill. kill -0, neither kills tar, nor waits for it to complete. The tar process is running in an infinite loop, and the removal/cleanup of files interferes in the process and causes tar to fail. Main process should wait for the tar process to complete before cleanup and then exit gracefully. I'll push the patch for the same. Could you please reopen the ticket? Thanks

          Noopur - In the patch, you stated "Changing directory to /tmp does not help in this case. We see these tar failures without Lustre mounted as well. There is a problem with the tar utility, OS or VM (kvm or vmware). This isn't a lustre problem. Abandoning."

          So, I am closing this ticket as "Not a Bug"

          jamesanunez James Nunez (Inactive) added a comment - Noopur - In the patch, you stated "Changing directory to /tmp does not help in this case. We see these tar failures without Lustre mounted as well. There is a problem with the tar utility, OS or VM (kvm or vmware). This isn't a lustre problem. Abandoning." So, I am closing this ticket as "Not a Bug"

          Hello Andreas,

          Dangling symlinks do not cause tar to fail. I created a dangling symlink in a temporary folder and performed tar on that folder, tar did not fail.
          I tried using "tar -cf --ignore-failed-read", it gives a warning instead of an error for read. Yes, it avoids an error on tar during read.

          noopur.maheshwari Noopur Maheshwari (Inactive) added a comment - Hello Andreas, Dangling symlinks do not cause tar to fail. I created a dangling symlink in a temporary folder and performed tar on that folder, tar did not fail. I tried using "tar -cf --ignore-failed-read", it gives a warning instead of an error for read. Yes, it avoids an error on tar during read.

          Have you verified that this is related to trying to archive dangling symlinks from the source /etc folder, or what is the source of the error? Have you tried using "tar -cf --ignore-failed-read" to avoid an error on tar during read? It may also be that these errors are generated at restore time because the files are being deleted during cleanup while tar is still running.

          adilger Andreas Dilger added a comment - Have you verified that this is related to trying to archive dangling symlinks from the source /etc folder, or what is the source of the error? Have you tried using "tar -cf --ignore-failed-read" to avoid an error on tar during read? It may also be that these errors are generated at restore time because the files are being deleted during cleanup while tar is still running.

          James,
          Can you have a look at the patch?
          Thanks.
          Joe

          jgmitter Joseph Gmitter (Inactive) added a comment - James, Can you have a look at the patch? Thanks. Joe

          Noopur Maheshwari (noopur.maheshwari@seagate.com) uploaded a new patch: http://review.whamcloud.com/17959
          Subject: LU-7656 tests: tar a temporary folder
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 76c60777cc75f9d1d6c870ce986d793517e21969

          gerrit Gerrit Updater added a comment - Noopur Maheshwari (noopur.maheshwari@seagate.com) uploaded a new patch: http://review.whamcloud.com/17959 Subject: LU-7656 tests: tar a temporary folder Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 76c60777cc75f9d1d6c870ce986d793517e21969

          People

            jamesanunez James Nunez (Inactive)
            noopur.maheshwari Noopur Maheshwari (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: