[LU-7656] replay-single_70c test failed tar: Exiting with failure status due to previous errors Created: 12/Jan/16 Updated: 13/May/16 Resolved: 13/May/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Noopur Maheshwari (Inactive) | Assignee: | James Nunez (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Environment: |
Configuration : 4 Node - ( 1 MDS/1 OSS/2 Clients) |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
== replay-single test 70c: tar 1mdts recovery == 02:32:52 (1441506772)
Starting client fre1211,fre1212: -o user_xattr,flock fre1209@tcp:/lustre /mnt/lustre
Started clients fre1211,fre1212:
fre1209@tcp:/lustre on /mnt/lustre type lustre (rw,user_xattr,flock)
fre1209@tcp:/lustre on /mnt/lustre type lustre (rw,user_xattr,flock)
Started tar 8730
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
Filesystem 1K-blocks Used Available Use% Mounted on
fre1209@tcp:/lustre 1377952 68056 1233908 6% /mnt/lustre
tar: Removing leading `/' from member names
test_70c fail mds1 1 times
Failing mds1 on fre1209
Stopping /mnt/mds1 (opts:) on fre1209
pdsh@fre1211: fre1209: ssh exited with exit code 1
reboot facets: mds1
Failover mds1 to fre1209
02:35:20 (1441506920) waiting for fre1209 network 900 secs ...
02:35:20 (1441506920) network interface is UP
mount facets: mds1
Starting mds1: -o rw,user_xattr /dev/vdb /mnt/mds1
fre1209: mount.lustre: set /sys/block/vdb/queue/max_sectors_kb to 2147483647
fre1209:
Started lustre-MDT0000
fre1212: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 11 sec
fre1211: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 11 sec
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
Filesystem 1K-blocks Used Available Use% Mounted on
fre1209@tcp:/lustre 1377952 68056 1237060 6% /mnt/lustre
test_70c fail mds1 2 times
Failing mds1 on fre1209
Stopping /mnt/mds1 (opts:) on fre1209
pdsh@fre1211: fre1209: ssh exited with exit code 1
reboot facets: mds1
Failover mds1 to fre1209
02:38:01 (1441507081) waiting for fre1209 network 900 secs ...
02:38:01 (1441507081) network interface is UP
mount facets: mds1
Starting mds1: -o rw,user_xattr /dev/vdb /mnt/mds1
fre1209: mount.lustre: set /sys/block/vdb/queue/max_sectors_kb to 2147483647
fre1209:
Started lustre-MDT0000
fre1212: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 9 sec
fre1211: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 9 sec
Resetting fail_loc on all nodes.../usr/lib64/lustre/tests/test-framework.sh: line 2976: 8730 Killed ( while true; do
test_mkdir -p -c$MDSCOUNT $DIR/$tdir || break; if [ $MDSCOUNT -ge 2 ]; then
$LFS setdirstripe -D -c$MDSCOUNT $DIR/$tdir || error "set default dirstripe failed";
fi; cd $DIR/$tdir || break; tar cf - /etc | tar xf - || error "tar failed"; cd $DIR || break; rm -rf $DIR/$tdir || break;
done )
done.
tar: etc/ssl: Cannot stat: No such file or directory
tar: etc/sysconfig/network-scripts: Cannot stat: No such file or directory
tar: etc/sysconfig: Cannot stat: No such file or directory
tar: etc/pam.d: Cannot stat: No such file or directory
tar: etc/rc.d/rc0.d: Cannot stat: No such file or directory
tar: etc/rc.d/rc5.d: Cannot stat: No such file or directory
tar: etc/rc.d/rc2.d: Cannot stat: No such file or directory
tar: etc/rc.d/rc4.d: Cannot stat: No such file or directory
tar: etc/rc.d/rc6.d: Cannot stat: No such file or directory
tar: etc/rc.d/rc3.d: Cannot stat: No such file or directory
tar: etc/rc.d/rc1.d: Cannot stat: No such file or directory
tar: etc/rc.d: Cannot stat: No such file or directory
tar: etc/profile.d: Cannot stat: No such file or directory
tar: etc/alternatives: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
|
| Comments |
| Comment by Gerrit Updater [ 12/Jan/16 ] |
|
Noopur Maheshwari (noopur.maheshwari@seagate.com) uploaded a new patch: http://review.whamcloud.com/17959 |
| Comment by Joseph Gmitter (Inactive) [ 14/Jan/16 ] |
|
James, |
| Comment by Andreas Dilger [ 14/Jan/16 ] |
|
Have you verified that this is related to trying to archive dangling symlinks from the source /etc folder, or what is the source of the error? Have you tried using "tar -cf --ignore-failed-read" to avoid an error on tar during read? It may also be that these errors are generated at restore time because the files are being deleted during cleanup while tar is still running. |
| Comment by Noopur Maheshwari (Inactive) [ 03/Feb/16 ] |
|
Hello Andreas, Dangling symlinks do not cause tar to fail. I created a dangling symlink in a temporary folder and performed tar on that folder, tar did not fail. |
| Comment by James Nunez (Inactive) [ 22/Feb/16 ] |
|
Noopur - In the patch, you stated "Changing directory to /tmp does not help in this case. We see these tar failures without Lustre mounted as well. There is a problem with the tar utility, OS or VM (kvm or vmware). This isn't a lustre problem. Abandoning." So, I am closing this ticket as "Not a Bug" |
| Comment by Noopur Maheshwari (Inactive) [ 29/Feb/16 ] |
|
Hello James, I figured out that it isn't a tar utility issue, instead it is a test case issue. kill -0, used in the test case, is to determine if one had permissions to send signals to a running process via kill. The tar process is running in an infinite loop, and the removal/cleanup of files interferes in the process and causes tar to fail. Could you please reopen the ticket? Thanks |
| Comment by Gerrit Updater [ 01/Mar/16 ] |
|
Noopur Maheshwari (noopur.maheshwari@seagate.com) uploaded a new patch: http://review.whamcloud.com/18732 |
| Comment by Gerrit Updater [ 11/May/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18732/ |
| Comment by Joseph Gmitter (Inactive) [ 13/May/16 ] |
|
Landed to master for 2.9.0 |